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Abstract 

In the rich and growing literature on diffusion and cascade effects in social networks, it 
is assumed that a node's actions are influenced only by its immediate neighbors in the social 
network. However, there are other contexts in which this highly-local view of influence is not 
applicable. The diffusion of technologies in communication networks is one important example; 
here, a node's actions should also be influenced by remote nodes that it can communicate with 
using the new technology. 

We propose a new model of technology diffusion inspired by the networking literature on 
this topic. Given the communication network G{V,E), we assume that node u activates {i.e., 
deploys the new technology) when it is adjacent to a connected component of active nodes in 
G of size exceeding node u's threshold 9{u). We focus on an algorithmic problem that is well 
understood in the context of social networks, but thus far has only heuristic solutions in the 
context of communication networks: determining the smallest seedset of early adopter nodes, 
that once activated, cause a cascade that eventually causes all other nodes in the network to 
activate as well. Our main result is a near-optimal approximation algorithm that returns a 
seedset that is an 0(7-^ log |F|)-factor larger than then the optimal seedset, where r is the graph 
diameter and each node's threshold can take on one of at most ^ possible values. Our results 
highlight the substantial algorithmic difference between our problem and the work in diffusion 
on social networks. 
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1 Introduction 



Cascade effects provide a simple and effective way to drive global diffusion of a new technology 
in a network: after a few well chosen seed nodes are convinced to adopt the technology, more and 
more nodes make local decisions to adopt the technology until eventually everyone in the network 
has adopted it. Given the complexity and expense involved in persuading a large, dispersed network 
of nodes to adopt a new technology, a particularly important algorithmic problem is to determine 
the smallest possible seedset of early adopter nodes, and thus also the "cheapest" way to drive a 
cascade that leads to global adoption [TTl [23] . 

Diffusion models are predicated on a model of node utility; namely, the benefit an individual 
node obtains when it decides to adopt the technology. In the rich literature on cascade effects 
in social networks (see e.g., [I5l EQl [28l [HI [23] and subsequent works), the model of node utility 
is highly local — it depends on only a node's "friends" or immediate neighbors in the social 
network. However, there are many interesting contexts where this highly-local model of utility is 
not applicable. 

Communication networks. We are particularly inspired by the example of diffusion of 
communications technologies in networks like the Internet; here, a node's utility should depend not 
only its immediate neighbors, but also with on the number of (possibly distant) nodes that it can 
communicate with using the new technology. There has been significant interest in the networking 
community in the impact of cascade effects on technology diffusion (see Section [L^ . Motivated 
by this research, in this paper we propose a new model of technology diffusion. Central to our 
model is the idea of non-local utility, which is present in much of the literature on communication 
technologies e.5., [6l [3 [Ml [291 [2Q1 [H [H] . 

Utility as connected components. We say a node is inactive if it uses a older version 
of the technology, and activates once it deploys its new, improved version. In our model, node 
utility depends on the size of connected components of active nodes adjacent to node u in G, 
i.e., on the size of the connected component containing u in the subgraph of G{y,E) induced by 
{n} U {f : f; G Node v is active}. This model captures the following two natural ideas: 

1. A pair of nodes u,v may communicate using the new technology only if there is path from 
ti to v in G consisting only of active nodes. This property characterizes many important 
networking technologies (see Section II. 4p . 

2. A node's utility should depend on the number of other nodes it can communicate with using 
the new technology. This idea is known in the popular literature as Metcalfe's Law, which 
states that utility that a single user gets from being part of a network of n users scales as 
n [27] ■ and in also line with traditional ideas in economics, e.g., [22) : "[t]he benefits that a 
consumer derives ... depend on how many other consumers ultimately purchase compatible 
units, ... in other words, ... [it] depends only on the final network sizes." 

Given this model of node utility, we consider the algorithmic problem of choosing the smallest 
seedset of early adopter nodes, that once activated, can cause a cascade that leads to global adoption 
of the new technology. This problem is particularly important in the context communication 
networks, where nodes typically represent profit-maximizing Internet service providers that must 
be convinced by governments or standards bodies to adopt a new technology (see e.g., [6l [3[ [T4l [29] ) . 
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1.1 Our setting. 

We assume that the underlying network structure G(y, E) is fixed, and consider a progressive 
technology diffusion process: a node starts out as inactive (using a older version of the technology) 
and activates (adopts its new, improved version) once it obtains sufficient utility from the new 
technology. Once a node is active, it can never become inactive. To model the cost of technology 
deployment, we associate a threshold 9{u) with each node u that determines how large its utility 
should be before it is willing to activate. We assume that a node u activates when its utility 
exceeds 9{u), i.e., if the connected component containing node u in the subgraph induced by 
{f : f € y. Node v is active} U {u} has size at least 0(n)0 

Optimization problem. Given graph G = {V, -E} and a deterministic threshold function 
: y — > {0, our goal is to find a seedset S that is feasible, i.e., when the nodes in S are 
activated every other node in the graph eventually activates as well. 

1.2 Our results. 

Our main result, stated precisely as Theorem 13.31 is an approximation algorithm: 

Theorem 1.1 (Main result). Consider a technology diffusion problem {G{V,E).,6} where the op- 
timal seed set has size opt, the graph has diameter r fi.e., r is the length of "longest shortest path" 
in G), and there are at most i possible threshold values, i.e., 9 : V ^ {6i, ...,9i}. Then there exists 
a polynomial running time algorithm that returns a feasible seedset S of size 0{rilog jy|opt). 

Exposition. Our algorithm, which uses linear programming, is based on two key ideas discussed 
in detail in Section [3.11 

1. Linearization. The non-local nature of our utility function makes it challenging to encode 
our problem as an integer program. Nevertheless, we observe that this function can be 
encoded using only local constraints if we restrict our search space to seedsets that give rise 
to connected activation sequences, i.e., seedsets that ensure that set of active node induce 
a connected subgraph of G at every point in the diffusion process. We then show that this 
restriction means that our IP must return a 2-approximation to the optimal seedset. 

2. Randomized rounding using network flows. Given the relaxation of the IP, we circumvent 
a potentially large integrality gap (Appendix [A]) by designing a novel randomized rounding 
algorithm that simultaneously interprets the fractional values returned by the linear program 
(LP) as network flows and probabilities. For this to work, we need to further restrict our 
search space; now we require the seedset itself to be connected, i.e., the nodes in the seedset 
induce a connected subgraph of G, and thus can obtain an 0(r^ log \ V\) approximation to the 
optimal seedset. 

On the optimality of our results. For the wide range of problem instances where r,£ = 
0(log|y|), our algorithm presents a 0(l)-approximation. The following lower bound (Section S]) 
indicates that our results are also near optimal for these instances: 

^Note that we can accommodate models of network value other than Metcalfe's Law by scaling the thresholds, 
e.g., for the Odlyzko- Tilly Law [5] replace 6 with e^. 
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Lemma 1.2. The technology diffusion optimization problem does not admit any o(ln \V\)- approximation 
algorithm, even when the graph has a constant diameter r, and the number of threshold values is a 
constant L Furthermore, the result holds even if we require the seedset to be connected. 

Indeed, r and I are 0(log \ V\) in many interesting settings: 

Graph diameter r. Empirical graphs of communication networks like the Internet exhibit very 
small diameter: the autonomous system graph used in has 37K nodes and diameter 11, while a 
router- level (skitter) graph from 2005 has 1.7M nodes and diameter 25 [26]. In fact, [26] has provided 
empirical evidence that these graph diameters actually shrink as the graph grows. Moreover, there 
is a large class of random graphs that have diameter r = 0(log|y|), e.g., Erdos-Renyi random 
graph family, the preferential attachment graph family p^j, etc. 

We can also show that our algorithm's dependence on r follows because it is restricted to 
returning connected seedsets. The following unconditional lower bound suggests that circumventing 
polynomial dependencies on r requires an algorithm that can return disconnected seedsets, and 
likely also a different set of techniques: 

Lemma 1.3. For any fixed integer r, there exists an instance of technology diffusion problem {G, 0} 
such that (a) the diameter of G is 6(r), and (b) the optimal connected seedset is at least ^{r) 
larger than the optimal seedset. 

Threshold granularity I. Parameter £ is a natural restriction on the granularity of the threshold 
function Q. [14J have argued that is it natural to restrict the granularity of Q, given the difficulty of 
obtaining empirical data on technology deployment costs relative to utility, for every single node in 
the graph. Indeed, the literature often deals with this difficultly by simply assuming that all nodes 
have the same Q values [281 [29| [T^ [6] , or by drawing Q randomly from some distribution [23] . 

Beyond heuristics. Given the prevalence of heuristics in the literature (see Section [1.4p . one 
might wonder if the seedsets returned by our algorithms are actually any better than heuristic 
solutions. To give evidence that our approach finds solutions that are not found by heuristics, 
in Section [5] we run our IP on a few small problem instances and find that it does indeed return 
solution that are different (and often substantially better) than several natural heuristics. 

1.3 Relationship to the hnear threshold model in social networks. 

One inspiration for our model was is linear threshold model for social networks, articulated in 
|23j and appearing in many other works. Indeed, we diverge from the linear threshold model only 
in our choice of utility function; our's is non-local, while theirs assumes a node's utility is given 
by the (weighted) sum of its active neighbors in G. Despite the superficial similarities, there is a 
substantial algorithmic difference between these models. [7| considers finding an optimal feasible 
seedset [i.e., the smallest seedset that activates all nodes in graph) in the linear threshold model in 
social networks where thresholds are fixed, and deterministic, and shows that it is NP-hard to find 
a seedset of size 0(2'°§ " 1^1) ■ opt for any e > 0; his result holds even if r, £ = 0(1). In contrast, if 
r, •£ = 0(1), then our main result shows that our model admits a 0(l)-approximation algorithm. 

Moreover, [23] worked around these discouraging approximation-hardness result by assuming 
that thresholds were chosen uniformly and independently at random after the seedset was selected. 
This way, they could show the submodularity of the "influence function", i.e., the expected number 
of nodes activated by a seedset 5, and therefore use greedy algorithms to find a seedset of size 
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(1 — 1/e — e) • opt. In contrast, we show in Appendix |F] that this submodularity property fails 
to hold in our setting, highlighting another difference between our model and theirs. In fact, we 
show our influence function is neither submodular nor supermodular, even if we (a) randomize the 
thresholds as in [23], or limit ourselves to (b) graphs of constant radius, or (c) a constant number 
of threshold values. Moreover, we see neither diminishing, nor increasing marginal returns even if 
we restrict ourselves to (d) connected seedsets. 

1.4 Related networking research 

In addition to the long line of work on diffusion in social networks {e.g., \15\ [301 l28l [TT| [23] 
and many others) , the networking community has been grappling with the problem of technology 
upgrades in the Internet for many years, see e.g., [HI El [31 dH [291 EOl HSl [121 [HI [21]. A large 
number networking technologies are characterized by the property we described above: that an a 
pair of active nodes can only communicate using the new technology if they have a path between 
them consisting only of active nodes. The most obvious example is secure Internet routing |241 125]. 
Here, cryptographically-signed routing messages may only be propagated on paths where each and 
every node is secure [251 HI]- This property is also shared by protocols like interdomain Quality 
of Service (QoS) 118] . fault localization [4], denial of service (DoS) prevention |32j, and, to a lesser 



extent, IPv6 [TO] B 

A number of works have used simulation studies to understand the relationship between seedset 
selection and cascading technology adoption e.g., [291 El El 121]. Most of these works, with the 
exception of [14J, have sidestepped the question of choosing an optimal seedset and have gone 
directly to using heuristics (often, "choose the high-degree nodes"). [H] study secure routing 
protocol deployment in realistic routing model, and after showing that it is NP-hard to find a 
constant approximation of an optimal seedset, move on to heuristics as well. While our work 
presents a more stylized model of the diffusion process, to our knowledge, it is also is the first 
approximation algorithm to provide worst-case guarantees when utilities are non-local. 

2 Formal Statement of Our Model 

Definition 2.1 (Technology diffusion process). Let G = {V,E} he a connected undirected graph. 
Let S he the seedset, an arbitrary subset ofV. Let 9 be the threshold function of G, which maps 
V to {6i, .., 0^} with 9i are all in {2, n}. The technology diffusion process on {G, 9} with respect 
to the seedset S is a family of functions {ft : V {0, 1} | t € N} such that 



• When t > 1, ft{u) = 1 if and only if 

1. ft-i{u) = 1 or 

2. in the subgraph induced by {v : ft-i{v) = 1} U {u}, the size of the connected component 
that contains u is at least 9{u). 

^There are technologies that allows IPv6 messages to be tunnelled from one disconnected IPv6-enabled component 
of the network to another. However, tunnelling often incurs unacceptable performance penalties [191 ITS] , so we may 
think of utility as the number of IPv6-enabled destinations a node can be reach without tunnelling. (The other 
technologies we mention do not support tunnelling.) 
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Thus, ft{u) is u's status as the t-th timestep; when ft{u) = 1, we say v is turned on or is activated 
at the t-th timestep; when ft{u) = 0, we say v is off or is inactive at the t-th timestep. 

Definition 2.2 (Technology diffusion optimization problem.). We say 

T = min{t : ft = ft+i} is the completion time of the diffusion process. We say S is a feasible 
seedset with respect to {G,9} if {v : /r('y) = ^} = V, i-e., . all nodes are turned on by the process' 
completion time. Then, the technology diffusion optimization problem is to find the smallest feasible 
seedset S when G and 9 are given as input. 



min 






subject to: Vf, i : 


Xi,t G {0, 1} 




Vi : 




(permutation constraints) 


Vt : 




(permutation constraints) 


Vt > l,i : 




(connectivity constraints) 



Figure 1: Simple IP for the technology diffusion optimization problem. 



3 Approximation Algorithm 

We start with a detailed overview of key technical ideas in Section 13.11 We then describe our 
IP formulation in Section 13. 2^ and prove its correctness in Section 13. 3| see also Appendix [A] for a 
discussion of the integrality gap that could arise in an alternate (simpler) version of the IP. Our 
rounding algorithm is presented in Section 13.41 and its correctness is proved in Section 13.51 Finally, 
to assist with the exposition, we present series of examples in Appendix[B]to illustrate constructions 
used by our algorithm. 

3.1 Highlights of our algorithm 

3.1.1 Linearization & formulating the IP 

A major complication of our setting is that a node's activation decisions can depend on remote 
nodes in the graph. Consider a step in the diffusion process where there are multiple disconnected 
active components, and at time t, a single node u activates and joins these components into single 
'giant' active component. This event would dramatically change the utility of nodes that are distant 
from node u but adjacent to new giant active component. These dramatic, non-local changes make 
it difficult to encode the problem as an IP; for instance, consider a natural IP formulation that 
uses indicator variables yi^t that are set iff node Ui is activated at timestep t. The IP would need 
to decide if the size of the active components including Ui at timestep t exceeds 0{ui), a task that 
would likely require the use of threshold gates. This complicates matters since IPs are generally 
unable to express threshold gates. 

It turns out that we can avoid threshold gates if the IP is only required to give an 2- 
approximation to the optimal seedset. To do this, we introduce the following notion: 

Activation sequences. Given a seedset S, we can define an activation sequence T as a 
permutation from V to [n] that indicates the order in which nodes activate. While Definition 12.11 
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imposes a one-to-one relationship between nodes and the timestep in which they activate, this 
notion of activation sequence is looser. Here, a seed may activate at any timestep, and a non-seed 
node u may activate at a timestep T(u) as long as u is part of a connected component of size at 
least 9{u) in the subgraph induced by {u} U {v : T{v) < T(n)}. 

Notice that we uniquely recover a feasible seedset S from an activation sequence T by deciding that 
node n is a seed iff 6{u) > T(u). Thus, we our IP will encode an activation function T, as a proxy 
for the seedset S. 

We can use activation sequences to convert node's activation decision from global (i.e., influ- 
enced by remote nodes) to local (i.e., influenced only by neighbors). To do this, we restrict the 
search space of our IP to connected activation sequences, i.e., activation sequences T such that at 
every timestep t, the set of active nodes induces a connected subgraph of G. This way, we know 
that there are exactly t active and connected nodes at every timestep t, so any inactive node u can 
decide to activate if (a) at least one of its neighbors are active, and (b) the current timestep t is 
t > 6{u). Armed with these observations, encoding the IP becomes straightforward. See Figured) 

Simple IP encoding. Let Xi^t be the indicator variable such that Xi^t = 1 if and only if 
T(vi) = t. The permutation constraints guarantee that the variables Xi^t represent a permutation. 
The connectivity constraints ensure that if Xi^t = 1 (i.e., node Ui activates at step t), there is some 
other node Ui' such that Ui' (a) is a neighbor of node Ui and and (b) activates at earlier time t' < t. 
Finally, the objective function minimizes the size of the seedset by counting the number of Xi^t = 1 
such that t < 9{ui). 

Bounding the size of the seedset. To see why restricting our search space to connected acti- 
vation sequences results in a 2-approximation, consider a timestep when two or more disconnected 
active components merge into a single component, and notice that whenever this happens, there 
is exactly one connector node that activates and joins these two components. It turns out the if 
we add every connector node to the optimal seedset, we can rearrange the activation sequence to 
enforce connectivity. Since every connector node causes a decreases the number of disconnected 
components, and number of disconnected component is bounded size of the optimal seedset, we 
have the following lemma (proved in Appendix ICl) : 

Lemma 3.1. The smallest seedset that can induce a connected activation sequence is at most twice 
the size of optimal seedset. 

3.1.2 Network flows & randomized rounding 

Unfortunately, we can't use the simple IP of Figure [T] to design our approximation algorithm, 
as it may exhibit a large integrality gap (see Appendix |A]) . To deal with this, we need a new idea 
for our rounding approach: we shall simultaneously interpret fractional values returned by the LP 
both as network flows, and as probabilities. 

The diffusion process as network flows. When a node u activates at time T{u), we imagine a unit 
flow that originates at a seed node, and flows to node u along the network induced by the nodes 
activated prior to timestep T{u). Our IP encodes this via flow constraints, that serve two purposes. 
First, they eliminate the the pathological example of Appendix |Al Second, they force our LP to 
return fractional flows / G [0, 1] that have the following pleasant interpretation: if there is a flow 
/ G [0, 1] from a seed node to a node u at time t, then node u has probability / of activating at 
time t. 
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Fractional flows as probability mass. Suppose that at time t, there are two disjoint flows /i and 
/2 originating from different seeds, and arriving simultaneously at node u. The total flow at node 
u at time t is then /i + /2. What does this merge of two disjoint flows mean in our probabilistic 
interpretation? It turns out that the natural interpretation is already pretty sensible: with prob- 
ability /i, the technology is diffused via the first flow, and with probability /2 the technology is 
diffused via the second flow. Now, the probability that the technology is diffused to u via either of 
these two flows is 1 — (1 — /i)(l — /2)- If fi, H are both small, this probability becomes ~ /i + /2, 
so that the total flow can be used to determine node m's activation probability. On the other hand, 
if /i or /2 is large, we are fairly confident that u should activate prior to time t, and so we can 
simply decide that T{u) < t without incurring a large increase in the size of the seedset. 
Connecting the seedset. Since network flows must originate at seed nodes, our rounding 
algorithm will require all seed nodes to activate before the non-seed nodes. Coupling this with the 
requirement that our activation sequence is connected, it follows that our rounding algorithm will 
return a connected seedset S {i.e., the nodes in S induce a connected subgraph of G). To guarantee 
a connected seedset, our rounding algorithm samples candidate seed nodes and glues them together 
as follows: 

Glue-Seeds(5) 

1 while S is not connected 

2 do Let C be the largest connected component in the subgraph induced by S. 

3 Pick u ^ C. Let P be the shortest path connecting u and C in G. 

4 Add nodes in P to S. 

5 return S. 

If r is the diameter of the graph (i.e., the length of the longest shortest path in G), then gluing 
incurs a factor of 0{r) increase in the size of the seedset, which we show is optimal in Lemma 14.21 



mm 




subject to: 
yi,t: 

Mi 

yt 

y{i',i} i E{G),t',t G [n] : 
V{i',f} G E{G),t' > t : 
y{i',i} e E{G),t' < t : 
yi,t: 
W,t',t: 



Xi,t G {0,1} 

'^t<n ^i,t ~ ^ 

ei',i,t',t = 
ei',i,t',t G {0, 1} 

Y^t'<tYl{i',i}(^E ^i',i,t',t = Xi^t 
Y.{i',i}eE ^i',i,t',t < Xi'^t' 
Xl,l = 1 



(permut'n constraints) 
(permut'n constraints) 



(tree constraints) 
(activity constraints) 
(make Xi^i the source) 



yi,t > 6{ui)y partitions of 1/(H, 

S,S, s.t.x+-, e s, sk e s 



'^ee5{S,S) '^(^) - ^e{ui)<t'<t^i,t' 



(flow constraints). 



Figure 2: Integer program for solving the technology diffusion problem. 
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3.2 Integer program 

In Figure [21 we present the IP we used to design our approximation algorithm. We replace 
the simple connectivity constraints of the IP in Figure [1] with a more robust set of constraints that 
use network flows to enforce connectivity, i.e., that all active nodes are connected, and that node 
u activates after there are at least 9{u) — 1 active nodes such that one of the active nodes is u's 
neighbor. 

Network flows. We require that when Xi^t = 1, we can push a flow of unit capacity from 
an arbitrary seedset node to Ui using only the subgraph induced by nodes activated before time 
t. To do this, we first extract the subgraph containing all the activate nodes at time t, and then 
ensure this subgraph admits a unit capacity of flow from a seed node to node Ui . We can do this by 
tracing the "trajectory" of the diffusion process, since when the seedset is connected, the technology 
diffusion process can be viewed as a growing tree. (See the example in Figure [6] of Appendix iBl) 
We therefore impose our flow constraints over the following tree: 

Timestamped diffusion tree. Node r^^(l) is the root. For each t > 1, the tree consists of 
all the nodes that are activated on or before time t. Moving from t — 1-th to t-th step, the node 
u = T~^{t) is appended to the tree as a new leaf by adding a single directed edge from u to an 
arbitrary tree node v such that v G {T~^(l), ...,T~^(t — 1)} and edge {u,v) is in the graph G. 
Finally, tree edge {T~^{t'),T~^{t)) is labeled with timestamps {t',t). To encode the timestamped 
diffusion tree in our IP, we use edge variables ei'^i^t',t such that ei'^i^t',t = 1 iff edge {i',i} is in the 
timestamped diffusion tree with label {t',t). The tree constraints ensure that each node in the 
timestamped diffusion tree has exactly one incoming edge, while the activity constraints ensure 
that only active nodes may have children in the tree. 

To impose the flow constraints, we use the following hypergraph H (see the example in Figure [7] 
of Appendix IB]1 : 

The hypergraph T-L. % has vertex set {Xi^t : i,t £ [n]}, where vertex Xi^t has "mass" Xi^f For 
every non-zero ei'^i^t',t, T~i has a directed edge from Xii^t' to Xi^t with capacity ei'^i^t',t- We also call 
vertices {Xi^t '■ t = Giui)} the threshold vertices of Ti, and let the threshold line be a line that joins 
all threshold vertices. All the vertices Xi^t such that t < 6{ui) are referred to as vertices to the left 
of the threshold line; all the rest are vertices to the right of the threshold line. 

We think of the vertices to the left of the threshold line in 7i as corresponding to seed nodes, while 
those to the right correspond to non-seed nodes. We want to ensure that all j to the right of the 
threshold line (corresponding to non-seeds) have mass Xi'^t' that exceeds neither (a) the capacity 
of the edge variables ei'^i^t',t, nor (b) the mass at the nodes that induced node i' to activate, e.g., 
Xi^f To do this, we define a family of multi-flow problems as follows: 

The (f,t)-flow problem. Fix an arbitrary i and t such that t > 6{ui). Let j be the node 
corresponding to root of the timestamped diffusion tree, i.e., j activates at the first timestep so 
that Xj^i = 1. Let Tit be the subgraph of H induced by the vertices {Xii^t' '■ i £ [n\,t' < t}, where 
the mass on each T-Lts vertices is interpreted as its capacity, and the directed edge from Xj' to 
Xi^t has capacity eii^i^t',t- The {i, f)-flow problem is a multiple-sink flow problem over Tit where that 
the source is Xj^i and the sinks are vertices to the right of the threshold line -'^j,6»(tJi)+i! •••) 

Xi^f The demand for sink Xi^t is xi^f The flow constraints ensure that there is a solution to the 
(z,t)-flow problem for each i and t € [n] such that t > 9{ui). 

Implementing the flow constraints. The {i, t)-flow constraints are enforced via max-flow- 
min-cut, i.e., by using the fact that the minimum cut between the source and sinks is the same as 
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the maximum flow. For each (i,t)-flow problem, a simple min-cut formulation requires: 

1. knowledge of the source, so assume Xi^i is known to be the source, (A simple, polynomial- 
time way to achieve this is by guessing; run the IP n times, relabeling a different node in the 
graph as ui for each run, and use the run that returns the smallest seedset. The subsequent 
discussion corresponds to this "correct" run.) 

2. a single sink, so we introduce node sk to connects sinks Xi^t, 

3. capacities on edges only, so we consider a new hypergraph Mt{i) that identical to Ht except 
that each node Xj^t is replaced with two nodes Xj~^, X~^ connected by a directed edge of 
capacity xj^r- 

In each hypergraph Mt{i), we need to supply sk with demand 'Ylie{ui)<t'<t^i;t' ■ Let S and 5 be a 
partition of the vertices of Blt(z), where X^^ € S and sk G S. Let 5{S, S) be the set of cross edges 
from S to S. Let c(e) be the capacity of each edge e in IHIt(z). The flow constraints require that 
the capacity of all cuts are at least as large as the demand at the sink node. 

3.3 Correctness of the IP 

We combine Lemma [3.11 with the following lemma to conclude that the IP in Figure [2] returns a 
seedset of size 2opt. (Strictly speaking, this holds only if when ui is a seed in an optimal connected 
activation sequence due the constraint "(make Xn the source)"; when this doesn't hold, the seedset 
has size at most 2opt + r, but as we discussed above we can ignore these runs of the IP.) 

Lemma 3.2. The IP in Figure\^ returns a connected activation sequence. 

Proof. We show that (a) all activation sequences satisfying the constraints of the IP must be 
connected, and (b) all connected activation sequences must satisfy the constraints. 

We start with the first item. We use induction over t to show that the constraints in Figure [2] 
excluding the flow constraints force the IP to consider only connected activation sequences: The 
base case where i = 1 is trivial because there is only one activated node, namely ui. For the 
induction step, suppose the set of active nodes is connected up to time t, and Xi^t+i = 1 for some 
i. The tree constraint ensures that there exists an i' and a,t'<t + l such that eii^i^t',t+i = 1- The 
activation constraint also ensures that Xj/ > ei',i,t',t+i = 1- Therefore, there exists a node i' that 
is activated before time t + 1 and is connected to z, so that the set of active nodes are also connected 
at time t + 1. 

To show the second item, we show that the flow constraints do not rule out any connected 
activation sequences. To do this, fix arbitrary i and t and consider two cases: 

Case 1. If for all t' < t, Xi^fi = 0, then there is no demand for any of the sinks in the {i,t) flow 
problem, so the flow constraints trivially cannot rule out any solutions. 

Case 2. Suppose there exists exactly one t' < t where Xi^t' = 1- From the definition of the {i,t) 
flow problem, we must have t' > 6{ui). Since the IP only searches through connected activation 
sequences T, it follows that T can be associated with timestamped diffusion tree that has a path 
from ui (the first node to activate) to Ui. It follows that there must also be a unit flow from Xi^i 
to Xi^ti in "H, and so this flow is a solution for the (i,t)-flow problem. □ 
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3.4 Relaxing and rounding algorithm 

Our next task is to relax the IP in Figure [2] in the usual wa}jl (replacing the indicator variables 
with real variables over the [0, 1] interval), and rounding the solution. Let a be an optimal solution 
for the LP. Given o", our rounding procedure will first reconstruct both the seedset S and an 
activation function T, and then iteratively reconcile inconsistencies between S and T. 

3.4.1 Properties of the rounding procedure. 

When we work with the relaxed LP, we also relax our notion of activation sequences. Now, 
we no longer assume that T is a permutation, and instead allow more than one node (or even no 
nodes at all) to activate in a single timestep. However, the following three properties ensure that 
this relaxed notion does not create problems: 

XI (Consistency): After rounding is complete, T and S will be consistent] that is, T encodes an 
order of activation for a diffusion process induced by {G, 9,S} where any seed node u G S* is 
allowed to activate at any time, and any non-seed node u ^ S is allowed to activate at any 
time it is connected to an active component of size at least 9{u) — 1. 

X2 (Connectivity): The activation sequence T is such that the set of active nodes is connected at 
all times (i.e., Ut'<t^^^(*') is connected in G for 1 < t < n.). 

X3 (Feasibility): The activation sequence T is such that every node eventually activates (i.e., 
T{u) < n for each u (zV). 

Thus, if the seedset S is consistent with the activation sequence T, then the seedset S is also 
feasible. 

3.4.2 Overview of the the rounding procedure 

Our rounding procedure works as follows. (To assist with the exposition, we also present an 
example of this rounding procedure in Appendix iBl) 

Reconstructing the seedset S. We use a randomized procedure to place a graph node Uj in the 
seedset with probability proportional to it's cumulative mass to the left of the threshold line i.e., 
St<6»(uj) ^i,t- This allows us to reconstruct a "small" seedset, but is not sufficient to guarantee that 
reconstructed seedset is feasible. 

Reconstructing the activation function T. One the other hand, our reconstruction of the activation 
function T will guarantee that T is connected and feasible, but not that T encodes a "small" seedset. 
We do this by relying heavily on the finer information provided by the individual "mass" Xi^t, and 
interpreting these values as both network flows and probabilities. On one hand, we use the flow 
interpretation to construct the activation sequence as a function of (a) the mass of the hypergraph 
nodes H to the left of the threshold line {i.e., which will act as the source of the network flows) and 
(b) the structure of hypergraph 7i (which will act as the network carrying these flows). On the other 
hand, we use the probability interpretation to ensure that distribution of T[ui) is, in expectation, 

^Note that the relaxed LP contains an exponential number of constraints (namely, the flow constraints). Never- 
theless, we can use the ellipsoid method to find an optimal solution in polynomial time using a separation oracle [31] 
that validates if each of the (j,t)-flow problems over H have solutions, and if not, returns a min-cut constraint that 
is violated. This oracle can be constructed using algorithms in, e.g., 117! . 
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approximately characterized by the vector {xi^i,Xi^2, ■■■,Xi^n)- Our reconstruction procedure will 
guarantee that T is connected and feasible, but not that T encodes a "small" seedset. 
Reconciliation. We need to worry about situations where the activation function has fewer than 
t nodes active at timestep t (inclusive). If we ignore this, T and S could be inconsistent: T might 
suggest that a non-seed node u ^ S activates "too early", i.e., at time T{u) where there are fewer 
than 6{u) active nodes. To deal with this, we reconcile the reconstructed seedset S and reconstructed 
activation function T, so that they become consistent, and we have a seedset that is both feasible 
and small. We use an iterative, i-step procedure, where i is the number of possible threshold 
values we have in the problem instance, i.e., 9 : V {6i, ...,6i}. In each step, we again use the 
probability /network flow interpretation of our problem to "repair" situations when the activation 
sequence T has fewer than 6j nodes activated at timestep 9j by adding extra nodes to the seedset 
S. 

Organization. To execute the above, we first construct a "preliminary" seedset Sq and 

activation function Tq (Section I3.4.3p . In the reconciliation stage (Section I3.4.4p . we iteratively 
construct a sequence of pairs {5i,ri}, {Si,Ti} , so that at the end Si is a feasible solution and 
Ti is consistent with Si (as proved in Section [33|) . 

3.4.3 Reconstructing the preliminary seedset Sq and activation function Tq 

The following describes the process for obtaining the preliminary seedset Sq and activation 
function Tq from the fractional solution to the LP relaxation a. We use a randomized process to 
obtain the preliminary seedset 5*0. Let e > be an arbitrarily small number, which controls the 
tradeoff between running time and the size of seedset: 

Prelim-Seedset(?^) 

1 Initialize 5*0 ^ 0. 

2 For each node Ui G V, add Ui to 5*0 with probability min |l , 24(1 + e) ln(2n) • ^^^g^^.-) 

3 Let 5*0 ^ Glue-Seeds(S'o). 

4 return Sq. 

(See Section [3.11 for the Glue-Seeds procedure.) We then deterministically obtain the activation 
sequence Tq. 

Get-Seq(?^, 5o) 

1 Initialize by flagging each Xi^t £ Ti as "inactive" by setting bi^t ^ 0. 

2 y Ui £ Sq, bi,t ^ 1 for t < 6{ui). / / "Activate" each Xi t to the left of the threshold line 

3 for t <— 1 to n 

4 do V i : 

5 if {3i',t' : {{Xi,^t',Xi,t) G EiU)) A {h^^t' = 1)) 

6 hi^t ^ 1 for t > 9{ui) 1 1 "Activate" each Xi^t to the right of the threshold line 

7 Obtain Tq by taking To(Mj) min{i : hi^t = 1}- 

8 return Tq. 

Notice that it is possible that Tq is infeasible, i.e., that Tq is such that some node u never activates 
(denoted by Tq{u) = oo). (See for example the first failure mode in Appendix |Bl) Thus, we repeat 
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Prelim-Seedset(?^) with fresh randomness until we obtain Sq that satisfies the two properties 
below. In Theorem 13.31 we show that a small number repetitions suffice to find 5*0 that satisfies: 

(P.l) Let To ^ Get-Seq('H,5o). For all i,t with J2t'<tHt > i2(i+e) ' To{ui) < t. 
(P.2) The size of So is at most 

|5o| = 24(1 + ef ln(2n)r • (2opt) (1) 
Note that (P.l) immediately implies that the activation sequence Tg is feasible; set t = n in (P.l), 

so that ^i,Y.t'<n^i,t' = 1 > i2(l+e) ' ^° '^oi'^i) ^ 

3.4.4 Reconciliation procedure 

The reconciliation procedure takes in the inconsistent preliminary seedset Sq and activation 
function Tq, a uses an i-st&ge process to reconcile them. (Recall that i defines the number of 
possible thresholds in our problem, i.e., 9 : V ^ {9i, ^^j.) 

We do this iteratively. At the A;th stage of the reconciliation procedure (for all k € {1, -.,1}), 
we assume that seedset Sk-i and activation function Tk-i from the previous stage are "good" up 
to timestep 6k-i, and use these to produce Sk and that are "good" up to timestep 6k. Our 
notion of "goodnesss up to time 6k" for seedset Sk and activation function Tk is defined as follows: 

(C.l) Sk and Tk are partially consistent up to to time 6k (inclusive). That is, for any node u such 
that Tk{u) < 6k — 1 (a) either n G S'^ (i.e., n is a seed), or (b) in the subgraph induced 
by u and the set of nodes that are active up to time 6k — I according to Tk, the connected 
component containing u has size at least 6{u). 

(C.2) Tk is such that the number of active nodes at time 6j — 1 (inclusive) is at least 6j — 1 for 
every j < k. 

(C.3) Sk grows by an additive factor of at most 

\Sk\Sk-i\ = rmax{log(2n/e) , 24(1 + e) • (2opt)} (2) 

Thus, the A;th stage of the reconciliation procedure takes seedset Sk^ and produce a new seedset 
Sk 5 Sk-i, as follows: 

Update-Seedset(?^, S) 

1 For each non-seed node Ui € V\S, add Ui to S with probability min |l , 4(1 + e) • '^t<e{ui) ^i,t 

2 Let 5 ^ Glue-Seeds(S). 

3 return S. 

Using Sk ^ Update-Seedset('H, Sfc-i), we can obtain a new activation sequence as 
Tk ^ Get-Seq('H, S'fc). As before, we repeat Update-Seedset('H, 5fc„i) with fresh randomness 
until we obtain Sk and Tk that satisfy (C.l) - (C.3). Theorem 13.31 we show that a small number of 
repetitions suffice. 
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3.5 Correctness of the approximation algorithm 

Finally, we prove the correctness of our approximation algorithm. 

Theorem 3.3. Our algorithm outputs a feasible seedset of size at most 

24(1 + ef ln(2n)r • (2opt) + £ • r max{ln(2.89n/e), 24(1 + e)(2opt)} (3) 

hy repeating Prelim-Seedset at most 0(1) times and Update-Seedset at most 0{nl/e) times. 

Proof. Our proof proceeds by showing the preliminary seedset 5*0 is 'small', and preliminary activa- 
tion sequence Tq is feasible and connected. We then inductively prove the reconciliation procedure 
resolves inconsistencies between the seedset and activation sequence, by adding a small number of 
new seeds to the seedset; because the seedset is consistent with a feasible activation sequence, the 
seedset is also feasible, and the theorem follows. 

PreUminary seedset 5*0 and activation sequence Tq. We start by showing that a small 
number of repetitions of Prelim-Seedset suffice to obtain a "small" preliminary seedset 5*0 of 
size 24(1 + e)^ ln(2n)r(2opt) {i.e., the first term in equation ([3])), and a preliminary activation 
sequence Tq that is both feasible and connected. Let T-L be the hypergraph obtained from the 
optimal solution to the relaxation of the IP in Figure [2j The following suffices to show we obtain 
a "small" preliminary seedset after few repetitions of Prelim-Seedset: 

Lemma 3.4. A single trial o/Prelim-Seedset(?^) satisfies property (P.2) with probability 1— o(l). 

We prove this as Lemma IH.ll (Appendix IH.ll) . The proof uses a Chernoff bound to show that 
Prelim-Seedset selects at most 24(1 -|- e)^ ln(2n)|cr| seeds with high probability. The proof pro- 
ceeds to argue that since the optimal solution to the LP a has size at most (2opt) (from The- 
orem [32]), and the Glue-Seeds procedure used in Prelim-Seedset expands the seedset by a 
factor of at most r, (P.2) holds with high probability, so that the size of 5*0 is bounded by the first 
term in equation 

We move on the preliminary activation sequence Tq ^ Get-Seq(^, 5o). To show that Tq is 
connected, we use the fact that Sq is connected and following connectivity lemma (which follows 
by construction of Get-Seq, as proved in Appendix IH.3P 

Lemma 3.5 (Connectivity). If S is a connected seedset and T is such that T ^ Get-Seq('H, 5), 
then T is connected. 

Finally, note that the feasibility of Tq (i.e., To(ui) < n for all Ui € V) follows from immediately 
from property (P.l): set t = n in (P.l) so that Ylt'<n^i,t' = 1 ^ i2(i+e) • Furthermore: 

Lemma 3.6. A single trial o/ Prelim-Seedset(7^) satisfies property (P.l) with probability Q(l). 

We provide a sketch of this more substantial argument here, while the full proof is in Appendix lH.2l 
as Lemma IH.2I The proof relies heavily on the idea that the Xi^t can be thought of both a net- 
work flows and probabilities. Fix an arbitrary Ui and let t{i) be the smallest integer such that 
Ylt<t{i) — i2{i+e) • simplify the exposition (this simplification is not used in our full proof), 
suppose that either (a) Xi^t = for all t > 9{ui), or (b) Xi^t = for all t < 6{ui). These represent the 
two extreme cases for our lemma; the intermediate cases can be dealt with algebraic manipulations 
that "interpolate" between these extreme cases. 
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For case (a) all the mass is to the left of the threshold line, and we use a Chernoff bound to 
show that that u is a seed with high probability. The interesting part of the proof comes for case 
(b) when all mass is to the right of the threshold line. To address this, we look at the hypergraph 
H, and find a set of hypergraph nodes R to the left of the threshold lines such that (i) 

^ x,,> 1/(12(1 + 6)). (4) 
Xj^t&Rt<eij) 

and (ii) for each Xj^t G R, if Get-Seq was to flag Xj^t as "active" {i.e., bj^t = 1) then Ui will 
be activated before time t(i). We first algorithmically extract the set of hypergraph nodes R, via 
network flow ideas; namely, we extract them from a feasible flow of the {i, t(z))-flow problem. Next, 
when Q holds, we use probabilistic analysis to show that whp at least one Uj corresponding to 
Xj^t € R will be selected as a seed, i.e., Prelim-Seedset will return Sq such that Uj £ Sq. It 
follows that Xj^t will be flagged as active in Get-Seq, and Ui will activate before time t{i). 

Reconciliation procedure. We show that each of the i stages of the reconciliation procedure 
grows the seedset by the additive factor in equation ^ , resulting in the second term in equation ([3]) . 
Moreover, we show the procedure results in a consistent seedset and activation function. 

We do this inductively. For k £ {!,...,£}, we show that given Sk-i and Tfc_i that satisfy 
conditions (C.1)-(C.2), it suffices to repeat Update-Seedset 0{n/e) times in order to obtain Sk 
and Tfe that satisfy conditions (C.1)-(C.3). (For the base case, note that (C.l) and (C.2) hold for 
preliminary seedset Sq and activation function Tq if we define = 0). We start by showing that 
(C.3) holds, so that seedset growth is small. 

Lemma 3.7 (Seedset growth). For the randomized Update-Seedset procedure for obtaining 
from Sk-i we have 

Pr[\Sk\Sk-i\ > rmax{log(2n/e),24(l + e)(2opt)}] < ^ 

While this proof uses similar Chrenoff bounds as in the proof of Lemma [331 we include it here 
in full in order to explain the somewhat 'unnatural'-looking terms in equation ([3]). 

Proof. Let ASk be the set of seed nodes selected during step 1 of Update-Seedset (before gluing) 
and recall that we add Ui to ASk with probability max{l, 4(1 + e) Y^t<e{ui) ^i,t}- suffices to bound 
|A5fc|, since jS'fcVS'fc-il < r\ASk\ after gluing in step 2 of Update-Seedset. Observe that 

E[|A5fe|] = ^min{l,4(l + e) Xi,t} < 4{l + e)J2 Yl < 4(1 + e)(2opt). 

i<n t<e{ui) i<nt<0{ui) 

(where the last equality follows because the objective function of the LP has size at most (2opt)). 
Let 5 = max{log(2n/e), 24(1 + e)(2opt)}. We bound the event that |A5fc| > 5 in two cases: 

Case 1. 24(1 + e)(2opt) > log(2n/e). Notice first that E|A5fc| < 6 x 4(1 e)(2opt) and we can 
apply the Chernoff bound (Part 2 of Theorem lG.ip : 

Pr [|A5fc| > < Pr [|A5fc| > 24(1 + e)(2opt)] < 2-24(i+^)(2opt) < 2-iog(2nA) < e/2n. 

Case 2. 24(1 e)(2opt) < log(2n/e). We now have E| AS'fc] < 6 x 4(1 e)(2opt) < ln(2n/e). Using 
the same Chernoff bound: 

Pr [\ASk\ >6]<Fi [\ASk\ > log(2n/e)] < 2-'°s(2nA) ^ ^^^n. 

□ 
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Next, we have a simple lemma (Lemma IH.6P that shows that partial consistency condition (C.l), is 
immediate given that Sk and that satisfy condition (C.2), and 5^-1 and T^-i satisfy condition 
(C.l) and (C.2). We leave that lemma to Appendix IH. 41 and move on to our main task: showing 
that a single trial of Update-Seedset produces Sk and Tk the satisfy (C.2) with probability 
0{e/n). We do this in two steps, (with proofs in Appendix IH. 51 - [H.6P : 

1. Ideally, we would like activation function T^-i to have 9k — 1 nodes active by time 9k — 1 
(inclusive) so that (C.2) will be met after Tk-i is updated to Tk- However, this may not be the 
case for Tk-i- Thus, we compute the gap between 9k — 1 and the number of active nodes at time 
9k — 1 according to activation function Tk-i and show such gap can be "filled" whp after we execute 
Update-Seedset once. The following lemma, proved in App endix IH . 5 1 follows almost immediately 
from algebra: 

Lemma 3.8 (Gap size.). At beginning of the k-th stage, define the "gap" as 

p = 9k-l-\{ur.Tk-i{u,)<9k-l}\ (5) 

Note that 7 is the total mass in T-L to the left of the threshold line corresponding the non-seed nodes 
Ui G V\Sk-i- It follows that 

Wi:Tfc_i(ui)>6»fc t<9k 

It follows that p<-f = En,:Tfc_iK)>efe J2t<e, Ht- 

2. Next, we show that the number of nodes moved to a timestep earlier than 9k in Tk, is larger 
than the gap p with probability at least e/n. Thus, it follows that condition (C.2) holds with 
probability at least e/n. 

Lemma 3.9. Let 

p=\{ui: {Tk-i{ui) > 9k) A {Tkiu,) < 9k)}\ 

he the number of nodes that are moved to a timestep earlier than 9k in activation sequence Tk 
(relative to Tk-i). Then Pr [p > 7] > ej/n. 

This lemma carries that majority of the substance of this part of the proof. We again need to 
combine a network flow interpretation with probabilistic analysis in a manner that is similar, but 
more sophisticated, than the analysis for Lemma [3.61 The main complication is that in Lemma 13.61 
the right-hand side of the inequality ([4]) is constant, while here we must bound the sum of the 
masses through the hypergraph nodes in R with a non-constant value that is related to the gap 7. 
The complete proof is presented in Appendix IH.6I 

Thus, after the reconciliation procedure. Si has size as in equation ([3]), and is consistent with 
activation sequence T^. Since is feasible. Si is also feasible and the theorem follows. □ □ 

4 Lower bounds 

We present two lower bounds. First, we show an r2(logn) inapproximation lower bound for for 
polynomial time algorithms, assuming no efficient (1 — o(l))-approximation algorithm for set cover 
problem exists. This lower bound holds even when r (the diameter of the graph) and i (threshold 
granularity) are constants, and implies that our algorithm is close to optimal for a wide range of 
interesting diffusion problems described in Section [H where r and i are 0(1): 
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threshold step length: 
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0.39 
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0.47 




0.39 




0.37 
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Table 1: Comparison the IP of Figured] to several heuristics. 



Lemma 4.1. There is no clnn- approximation algorithm (for some constant c) for the technology 
diffusion optimization problem for a general graph, even if the seedset is required to be connected, 
and graph diameter r and threshold granularity i are 0(1)- 

The proof, in Appendix [Dl is a reduction that takes an a- approximation algorithm for the technol- 
ogy diffusion problem and returns an 0(a)-approximation algorithm for set cover problem. The 
lemma follows because the set cover problem cannot be approximated within a factor of Q(\nn) 
(see |2] and references therein). 

Second, we show an unconditional 0(r)-inapproximation lower bound for the family of algo- 
rithms that only search for connected seedsets {i.e., seedsets that induce a connected subgraph of 
G), that explains our approximation algorithm's dependence on graph diameter r: 

Lemma 4.2. For any fixed integer r, there exists an instance of technology diffusion problem {G, 6} 
such that (a) the diameter of G is 0(r), and (b) the optimal connected seedset is at least VL{r) larger 
than the optimal seedset. 

The proof is in Appendix |El It follows that circumventing polynomial dependencies in graph di- 
ameter r requires algorithms that can return disconnected seedsets. As discussed in Section I3.H 
we believe that doing this will require substantially different techniques, and is thus an interesting 
direction for future work. 

5 Going beyond heuristics. 

Given the prevalence of heuristics like "choose the high degree nodes" in the literature on 
technology diffusion in communication networks {e.g., [6l [3l 113]), we sanity-check our approach 
against several heuristics. We emphasize that our goal in the following is to give evidence that we 
can find solutions that are substantially different from natural heuristics. 

We considered problem instances where (a) Giy, E) is 200-node preferential attachment graph 
with node outdegree randomly chosen from {1,2,3,4} [1], and (b) thresholds 6 randomly chosen 
from {max{2, c}, 2c, 3c, • c}. We ran four groups of experiments with threshold step-length 
parameter c fixed to 1, 5, 10, and 20 respectively. With each group, we used a fresh random 
preferential attachment graph, and repeated the experiment five times with a fresh random instance 
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of the threshold functions. We solve each of these 20 problem instance using the IP formulation 
presented in Figure [U (with the extra restriction that the highest degree node must be part of the 
seedset) and the Gurobi IP solver. We compare the result against five heuristics that iteratively 
pick a node u with property X from the set of inactive nodes, add u to the seedset S' , activate 
u, let u activate as many nodes as possible, and repeats until all nodes are active. We instantiate 
property X as: 

(a) degree: highest degree, 

(b) degree-threshold: highest (degree) x (threshold) , 

(c) betweenness: highest betweenness centrality, 

(d) degree discounted: highest degree in the subgraph induced by the inactive nodes [8], 

(e) degree connected: highest degree and connected to the active nodes. 

For each group. Table [1] presents the average seedset size and the average Jaccard index 
between IP seedset S the heuristic seedset S'. We also compute the fraction of nodes in S that 
are also part of the top-|5| nodes in terms of (a) degree (the row denoted "degree overlap"), and 
(b) betweenness centrality ("betweenness overlap"). The results of Table [J do indeed give evidence 
that our IP can return seedsets that are substantially different (and often better), than the seedsets 
found via heuristics. 
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A The simple integer program 



Our IP in Figure [2] is quite complex; in this section we show why we were not able to base our 
approximation algorithm on the simpler integer program of Figure [TJ For the sake of exposition, 
we suppose that Xi^t is a mass that gives a measure of the probability that node Ui activates at time 
t, and refer to the example in Figure [3) 




t = 1 


XA,1 


= 0.1 




t = 2 


XB,2 


= 0.1 


(because xb,2 < xa,i) 


t = 3 


XC,3 


= 0.1 


(because xc,3 < xb^i 


t = 4 


XBA 


= 0.2 


(because xb,4: < xa,i + xcs) 


t = 5 


XC,b 


= 0.2 


(because xc,5 < xb,2 + xb,a) 


t = 6 


XBfi 


= 0.4 


(because xb,6 < xa,i + xc,3 + xc,^) 


t = 7 


xcj 


= 0.7 


(because xcj < xb,2 + xb,a + XBfi) 


t = 8 


XB,8 


= 0.3 


(because xb,8 < xa,i + xc,3 + xc,5 + xcj) 


t = 9 


XA,9 


= 0.9 


(because xa,9 < Y.t'<8 ^B,t') 



Figure 3: Here, the solution returned by the relaxed LP is unlikely to be helpful in rounding. 



Figure [3l Suppose LP returns a solution such that at t = 1, node A has mass 0.1, while all other 
nodes have mass 0. The constraints repeatedly allow mass from node A to circulate through nodes 
B and C and then back to A. (See the right hand side of Figure [3] for the variable assignments over 
the time). Finally, at t = 9, enough mass has circulated back to A so that A will have mass 0.9, 
so that A has "probability" 0.9 of activating. Note that this is highly artificial, as all of this mass 
originated at A to begin with! In fact, no matter how we interpret these Xj^t, the example suggests 
that this "recirculation of mass" is not going to give us any useful information about when node A 
should actually activate. 

We took care of this recirculation of mass by introducing the timestamped diffusion tree and 
the flow constraints, to prevent mass from circulating from a node back to itself at some future 
time {e.g., From node A at time t = 1 back to itself at time t = 9). To illustrate how the flow 
constraints eliminate the pathological example of Figure [3l in Figure |4] we presents the hypergraph 
7i corresponding to the first 4 timesteps of activation sequence showed in Figure [3l Notice that T-L 
violates the constraints of the IP in Figure [2] because the (i?,4)-flow problem has in total demand 
0.2 {i.e., xb,2 = 0.1 and xb,4 = 0.1) but there is no way to supply this demand from Xa,i- 
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Threshold Line 




Figure 4: The hypergraph % corresponding to the activation sequence of Figure [3l 

B Some Examples 

We now present a few examples of the constructions we used in our main IP in Figure [21 and 
its relaxation, as described in Section [3.41 

We start with a technology diffusion problem in Figured! and present an optimal timestamped 
diffusion tree and activation function T for this problem in Figure [6l The hypergraph 1-i that would 
result from an integer solution to the IP, is presented in Figure [71 Notice that the edges in % form 
a tree corresponding to the timestamped diffusion tree in Figure [6l Meanwhile, the hypergraph 
H in Figure [HI is constructed from a fractional solution of the LP for the problem in Figure [3 
Notice that the edges in T-i do not correspond to a tree (so that in the LP relaxation, we no longer 
have the assurance that % encodes a timestamped diffusion tree); however, T-i does not contain any 
violations of the flow constraints. 

We now use % in Figure [3 to illustrate our rounding procedure. In particular, we illustrate 
two "failure modes" where at intermediate stages of our algorithm, the seedset S and activation 
sequence T violate one of the three conditions in Section 13.4.21 as well as a "success mode" where 
S and T adhere to all three of these conditions. 
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The optimal solution for the IP 
XA,l = 1, (Vt l,XA,t = 0) 

XB,2 = l: (Vty^2,XB,t = 0) 
XC,3 = 1, (Vt/3,XB,t=0) 

XD,b = 1, (Vi / 5,XB,t = 0) 

XEfi = 1, (Vt 7^ 6,XB,t = 0) 
XF,4 = 1, (V^ / 4,XB,f = 0) 



Figure 6: A solution for the technology diffusion problem in Figure [5] (on the right hand side) and 
the corresponding timestamped diffusion tree (on the left hand side). 



Threshold Line 




Figure 7: The H. graph corresponding to the diffusion tree in Figure [H 
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Threshold Line 




Figure 8: The Ti graph obtained from the solution of the relaxed program Figure [6l 



Threshold Line 




Figure 9: Setting the seed set S = {A} and using the flow graph from H, the flag variables are 
activated along the solid trajectories. We have T{A) = 1, T{B) = 5, T(C) = 3, T{D) = 5, 
T{E) = 6, T(F) = ±. 
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Failure 1: Infeasible activation function. Let Sq ^ Prelim-Seedset(7^) and Tq ^ 
Get-Seq(7^, S'o) fr the hypergraph V. in Figure El Suppose Sq = {A}. Figure [9] shows the update 
of the flag variables bi^t inside Get-Seq using seedset Sq = {A}. This example gives us 

• So = {A}. 

. To{A) = 1, To{B) = 5, To{C) = 3, To{D) = 5, To{E) = 6, ro(F) = ±. 
That is, To = {A, _L, C, _L, {D, B}, E) 

First, note the flag variables are activated along the solid trajectories, and that even though hy- 
pergraph node (C, 3) is flagged as active, there is no solid trajectory from (C, 3) and 4); this is 
because F ^ Sq and Get-Seq only activates nodes to the right of the threshold line. Moreover, 
observe that Get-Seq can sometimes leave certain timesteps in the activation function Tq empty, 
as with timesteps 2 and 4 in the example above. Finally, note that in this example, our activation 
function Tq is infeasible; that is, node F never turns on! Thus, this example represents a failed 
run of Prelim-Seedset that violates condition (P.l); so we must rollback Prelim-Seedset and 
re-execute it until it it returns a feasible activation sequence (i.e., where all nodes eventually turn 
on). 

Failure 2: Inconsistent preliminary seedset and activation function. Suppose now 
that we re-ran Prelim-Seedset until it returned preliminary seedset Sq = {A,F}. Returning to 
Figure m Get-Seq would now flag (F, 1), (F, 2), (F, 3), (F, 4), (F, 5) as active (because F is a seed), 
and we would have additional solid trajectories from hypergraph node (F, 4) to hypergraph nodes 
(C, 5),(F, 5) and (F,6). Since we consider nodes to be active at the timestep corresponding to 
their earliest active hypergraph node, our activation function would become: 

Tq = {{A,F},±,C,±,{B,D,E},±) 

First, note that our activation sequence is now feasible - every node eventually turns on in Tq - 
so that Sq and Tq would be accepted as a preliminary seedset and activation sequence. However, 
observe that and Tq inconsistent (as defined in Section l3.4.2p . To see why, note that Tq has 
node D activating at timestep 4. Referring to the threshold line, we note that D has threshold 
4, and according to Sq, we have that D is not a seed. However, Tq indicates that just before 
timestep 4 there are only three nodes that are active (nodes A, F, C). This is precisely why and 
To are inconsistent; because they suggest that a non-seed node prematurely activates, i.e., when 
the number of active nodes is less than as required by his threshold! 

This is where our reconciliation procedure comes in; as we discuss in Section 13. 4. 4^ our rec- 
onciliation procedure iteratively adds additional nodes to the seedset until the resulting activation 
function and seedset become consistent. We note that in this example, it suffices for the reconcili- 
ation procedure to add either node B,D or F to the preliminary seedset. 

Success: Feasible and consistent seedset and activation function. Finally, suppose the 
seedset becomes {A,B,F}. Then, the activation function becomes 

T={{A,B,F},±,C,±,{D,E},±) 

which is the sort of activation function that we would like to have at the termination of the 
reconciliation procedure, since it is both feasible (every node eventually turns on) and consistent 
(non-seed nodes never activate prematurely). 
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C Optimal connected activation sequences provide a 2-approximation 



Recall that a connected activation sequence T is such that the set of active nodes at any 
timestep t induces a connected subgraph of G, while a connected seedset is such that all nodes 
in S induce a connected subgraph of G. Notice that requiring the activation sequence T to be 
connected is weaker than requiring a connected seedset S: since T allows a seed to activate after a 
non-seed, the connectivity of T can be preserved by non-seeds whose activation time occurs between 
the activation times of the seed nodes. 

We now show that the smallest seedset that gives rise to a feasible connected activation se- 
quence is at most twice the size of the optimal seedset opt. 

of Lemma \3.1l Given an optimal activation sequence Topt and seedset opt, we shall transform it 
into a connected activation sequence T. Along the way, we add nodes to the seedset in manner 
that increases its size by a factor of at most 2. 

Notation. Let Gi{T) be the subgraph induced by the first i active nodes in T. We say a node 
ti is a connector in some activation sequence T if the activation of n in T connects two or more 
disconnected components in Grp(yj-i{T) into a single component. 

Creating a connected activation sequence. Notice that an activation sequence T(-) is 
connected if and only if there exists no connector in the sequence. Thus, it suffices to iteratively 
"remove" connectors from T until no more connectors remain. 

To do this, we initialize our iterative procedure by setting T ^ Topt- Each step of our procedure 
then finds the earliest connector u to activate in T, adds u to the seedset, and applies the following 
two transformations: 

Transformation 1: First, we transform T so that every component in Gx(u)iT) is directly 
connected to u. Let D{u) be the subsequence of T such that every node in D{u) both activates 
before u, and is part of a component in Gj'(„)(T) that is not connected to u. Transform T so the 
subsequence D{u) appears immediately after node u activates. (This does not harm the feasibility 
of T, because the nodes in D(u) are disconnected from the other nodes in Gj'(„)(T) that activate 
before u.) 

Transformation 2: Next, we transform the activation sequence so that it is connected up to 
time T{u). To see how this works, assume that there are only two connected components Ci and 
C2 in G7^(„)_i(T), where \Gi\ > IC2I. Our transformation is as follows: 

1. First, activate the nodes in Ci as in T(-). 

2. Then, activate u. (This does not harm feasibility because we added u to the seedset. Con- 
nectivity is ensured because u is directly connected to Ci.) 

3. Finally, have all the nodes in G2 activate immediately after u; the ordering of the activations 
of the nodes in C2 may be arbitrary as long as it preserves connectivity. (This does not harm 
feasibility because (a) seed nodes may activate at any time, and (b) any non-seed v G G2 
must have threshold 6{v) < IC2I < |Ci| and our transformation ensures that at least \Gi \ + 1 
nodes are active before any node in C2 activates.) 

We can easily generalize this transformation to the case where k components are connected by u 
by letting \Gi\ > IC2I > ... > \Gk\ and repeating step (3) k — 1 times. At this point the transformed 
activation sequence is feasible and connected up to time t = 1 + |Ci| + IC2I + ... + |Cfc|. 
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Seedset growth. It remains to bound the growth of the seedset due to our iterative procedure. 
We do this in three steps. First, we observe that number of extra nodes we added to the seedset is 
bounded by the number of steps in our iterative procedure. Next, we iteratively apply the fohowing 
claim (proved later) to argue that the number of steps in our iterative procedure is upper bounded 
by number of connectors in the optimal activation sequence, Topt: 

Claim C.l. Let Tj be the activation sequence at the start of j^^ step. The number of connectors 
in T!,-|_i is less than the number of connectors in Tj. 

Thus, it suffices to bound the number of connectors in Topt. Our third and final step is to show 
that the number of connectors in Topt is bounded by |opt|. To do this, we introduce a potential 
function that counts the number of disconnected components in 6*71^^^ (T) , and argue the 
following: 

• For every connector u that activates at time t in Topt and joins two or more disconnected 
components, there is a corresponding decrement in $, i.e., <^(t) < ^{t — 1) — 1. 

• Next, we have that ^(1) = <I>(|y|), since at the first timestep, there is only one active node, and 
at the last timestep all the nodes in the graph are active and form a single giant component. 
Thus, for every unit decrement in $ at some time t, there is a corresponding unit increment 
in ^ at some other time t'. 

• Finally, for any unit increment in $, i.e., = — 1) + 1, it follows that a new discon- 
nected component appears in Gj'^^^(t'){'^)- This implies that a new seed activates at time t'. 
Thus, it follows that the number of unit decrements of <I> is upperbounded by the size of the 
seedset |opt|. 

Thus, we may conclude the the number of connectors added to seedset in our iterative procedure 
is upperbounded by the number of connectors in Topt which is upperbounded by the size of the 
optimal seedset opt, and the lemma follows. □ 

The correctness of Claim [CTT] is fairly intuitive, given that our transformations always preserve 
the ordering of the nodes that are not in the components joined by node u. We include the proof 
for completeness. 

of Claim \C.ll We make use of the following observation: 

Observation 1: If two activation sequences T and T' have a common suffix, i.e., T = T' for 
timesteps r, r + 1, |y |, then T and T' contain the same number of connectors after time r — 1. 

Let t = Tj{u). By construction, no connectors exist in Tj prior to time t. Furthermore, we can 
use Observation 1 to argue that Tj and Tj-\.i contain the same number of connectors after time t. 
Thus, it suffices to show that Transformations 1 and 2 in the j^^ step of our iterative procedure do 
not introduce new connectors that activate in prior to time t. 

Let T* be the activation sequence after Transformation 1 in the j*'^ step of our iterative 
procedure, and let t' = T*{u). We can see that (1) no new connectors activate before time t' in T* 
(since, before t' our construction ensures that T* consists only of disconnected active components 
that are joined by u) and (2) no new connectors activate between time t' + 1 and t inclusive (since 
(a) u was chosen as the earliest connector in Tj, and (b) Transformation 1 preserves the order of 
the nodes that activate between time t' + 1 and t inclusive in T*). 
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Finally, we conclude by arguing that Transformation 2 cannot introduce new connectors by 
(1) applying Observation 1 to the nodes after t' and (2) observing that after Transformation 2, the 
nodes that activate before t' create a single connected component, and thus by definition cannot 
contain any connectors. □ 

D Reduction to Set Cover 

Let us recall the definition (of the optimization version) of the set cover problem: given a finite 
universe U and a family S of subsets of U, we are interested in finding the smallest subset T of 
S such that T is a cover of U, i.e. Utgt T = U. Because this problem cannot be approximated 
within a factor of 0(lnn) (see [2J and references therein), the following result proves Lemma |4. II 

Lemma D.l. Given an a- approximation algorithm for the technology diffusion problem with con- 
stant number of threshold values > 2, and constant graph diameter r > 3, we can obtain an 
O (a) -approximation algorithm for set cover problem. Moreover, the reduction holds even if the 
seedset in the technology diffusion problem is required to be connected. 

We remark that the main difficulty in constructing the reduction is that our utility function 
is non-local (i.e., a node may decide to activate because a remote node activated), while in typical 
NP complete problems, constraints are usually expressed in local form (i.e., they only depend on 
a small number of variables). To encode constraints of an NP complete problem into gadgets for a 
technology diffusion problem, we need to carefully insulate "influences" across different vertices so 
that node activations do not trigger an unplanned cascade. We do this using a padding argument. 
Roughly speaking, to protect against inadvertent activations of a vertex v, we replicate the vertices 
u that are supposed to activate v so that they block influences from other possibly-activated vertices 
connected to v. 

of Lemma \D.l[ Let us consider an arbitrary a set cover instance (Z//, T), where m = \T\ is the 
maximum number of sets in T. 

The reduction. We construct a technology diffusion problem as described below, and illustrated 
in Figure [TOl 

• The vertex set consists of the following types of vertices: 

1. The set type: for each T G T, we shall construct a vertex ut in the technology network. 

2. The element type: for each e (zU, we shall construct m + 1 vertices Ue,i, Ue^2, We,m+i- 

• The edge set consists of the following edges: 

1. For each T G T and e G T, we add the edges {uT,Ue,i}, {uT,Ue,2}i {uTiUe,m+i}- 

2. The set type vertices are connected as a clique. (For each T 7^ T' G T, we add the edge 
{ut,ut'}). 

• The thresholds ^(•) are set as follows, 

1. For any e €zU and i < m + 1, we set 6[U(,^i) = 2. 

2. For every T G T, we set 6{ut) = {m + l)n + 1. 
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Properties of the reduction. Notice that our technology diffusion problem has only two 
types of threshold values. Furthermore, the diameter of the graph we form is exactly 3 hops (in 

terms of edges); the maximum distance in this graph is from one Ug^i node to another. Finally, we 
show below that the secdsct must consists of set-type vertices. Since these vertices form a clique, 
it follows that the seedset must be connected. 

Correctness. To conclude that the size of the optimal seed set is the same as the size of the 
optimal cover (which also means that our reduction is approximation-preserving) , we establish the 
following: 

Item 1. For any feasible cover S in the set cover problem, the corresponding seed set {us : S E S} 
is a feasible solution for the technology diffusion problem. 

Item 2. Any feasible seedset in the technology diffusion problem that only consists of set-type 
vertices corresponds with a feasible cover in the set cover problem. 

Item 3. Given a feasible seedset that consists of element type vertices, there is an feasible seedset 

of equal or smaller size that consists only of set type vertices. Since the set type vertices 
form a clique, we have that the optimal solution for the technology diffusion problem is also 
a connected one. 

Item 1. To show the first item, we simply walk through the activation process: When 5 is a 
cover, let the seedset be tiT^ for all Tj G S. Notice that this seedset is connected. Upon activating 
the seedset, the vertices Ue,i for all e G W and i <'m+l are activated on because they are connected 
to at least one active seed. Now, there are (m-|- l)n active nodes, so the rest of the set type vertices 
are activated. 

Item 2. To show the second item, we consider an arbitrary seedset that only consists of the set 
type vertices: U = {uTi,itT2! ■■■^ut^}, where Ti, ...,Tk € T. We shall show that if Ti, ...,Tk is not a 
cover, then the seed set cannot be feasible {i.e., some nodes will remain inactive in the technology 
diffusion problem). 
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Let e ^ U/ iy^j<kTj) be an element that is not covered by the sets in {ri,...,Tfc}. Let us 
consider the vertices «e,i,'«e,2) ■■■■,'U'e,m+ii and vertex ut for each T ^ {Ti, ...,Tk} in the technology- 
diffusion problem. We claim that none of these vertices will be activated with seedset U . Suppose, 
for the sake of contradiction, that one or more of these vertices are activated, and consider the first 
activated vertex among them. There are three cases: 

Case 1. Ut {T ^ T) is activated first. This is impossible: when Uf>^i [i <m+l) are not activated, 
the number of activated nodes is at most (n — l)(m + 1) + m < (m + l)n. 

Case 2. Ue^i (i < m + 1) is activated first. This is impossible because Ue,i is only connected with 
liT, where T ^ {Ti, ...,Tfc} and none these set type vertices are activated. 

Item 3. Finally, we move onto the third item. Let us consider a feasible seedset F that does not 
consist of only set type vertices. We show that we can easily remove the element type vertices in 
F: let Ue,j be an arbitrary vertex in F. Then we can remove Ue^i from F and add an ut to F such 
that e G T. This does not increase the cardinality of F. Furthermore, Ue^i would still be activated, 
which implies that the updated F is still be a feasible seed set. □ 

E Connectivity implies dependence on graph diameter r 

We now prove Lemma 14.21 which shows that any algorithm that considers only connected 
seedsets, suffers a factor of r loss in the approximation rate: 

of Lemma Let r > be an arbitrary integer. Let us define a line graph Gr as follows (Fig- 
ure E]): 

• The vertex set is {fi, ...,V2r+i}- 

• The edge set is : 1 < i < 2r + 1}. 
The threshold function shall be defined as follows, 

• e{vi) = 9{v2r+i) = 2 and 0{vr+i) = 2r + 1. 

• For 1 < i < r, 6{vi) = i. 

• For r + 2 < f < 2r + 1, e{vi) = 2r + 2-i. 

It is straightforward to see that the diameter of the graph is 2r = 0(r). It remains to verify that 
the optimal connected solution is Q{r) times larger than the optimal solution. 

It's easy to see that {wi, W2r+i} is a feasible seedset and therefore, the size of the optimal seed 
set is 0(1). We next show that any feasible connected set has size 0(r). 




61=2 82=2 83=3 ... 8=r 8,+i=2r+l 8,+2=r ••• 02r-i=3 02r=2 e2r+i=2 
Figure 11: An instance of the technology diffusion problem for the proof of Lemma |4.2[ 
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Since the seedset must be connected, wlog we can assume that the seedset is {vi,Vi+i, ...,Vj} 
and by symmetry i < r + 1. When j < r + 1, node Vr+i will never activate (because Vr+i has 
threshold 2r + 1, it only activates when all other nodes are active, but in the case all r nodes to 
the right of Vr+i are inactive). It follows that a feasible seedset requires j > r + 1. 

When i = 1, the size of the seedset is Q(r) and the lemma follows. So, need only consider 
the case where i > 1: symmetry allows us to assume wlog that r + 1 — i > j — {r + 1) i.e., 
9{vj-^-i) > 9{vi-i). Therefore, since we have j — i + 1 nodes in the seedset, a necessary condition for 
this seedset to be feasible is thus j — i + 1 > i — 2. Using the fact that j > r + 1, we get i < r/2 + 2 
and j — i = Q{r), which completes our proof. □ 

One drawback of this construction is that i = B(n). We may modify 9{-) so that i = 0(1) 
(thus ensuring that our lower bound depends on graph diameter r, rather than the number of 
thresholds £): 

• When i < n, set 9{ui) = max{2Li°S2'J , 2}, 

• when i = n + 1, set 9{ui) = 2n + 1, and 

• When i > n, set 9{ui) = max{2Li°S2{2n+2-i)J , 2}. 

One can use similar arguments to show that the size of the optimal seedset is 0(1) while the size 
of the optimal connected seedset is 0(r). 

F Our problem is neither submodular nor supermodular 

We wondered about the relationship between the algorithmic properties of our model and the 
linear threshold model on social networks articulated in [23 . [71 showed that the problem of selecting 
an optimal seedset in the linear threshold mode in social networks cannot be approximated within 
a factor of 0(2'°^ 1^') when the thresholds are deterministic and known to the algorithm. |23j got 
around this lower bound by assuming that nodes' thresholds are chosen uniformly at random after 
the seedset is selected, and designing an algorithm that chooses the optimal seedset in expectation. 
Their (1 — 1/e — e)-approximation algorithm relies on the submodularity of the influence function, 
i.e., the function f{S) which gives the expected number of nodes that activate given that nodes in 
S are active. 

In this section, we shall show that algorithmic results for submodular and/or supermodular 
optimization do not directly apply to our problem, even if we restrict ourselves to (a) graphs of 
constant diameter, (b) diffusion problems with a small number of fixed thresholds, or if (c) we 
choose the thresholds are uniformly at random as in [23]. Moreover, we see neither diminishing, 
nor increasing marginal returns even if we restrict ourselves to (d) connected seedsets. 

F.l Fixed threshold case 

In this section, we construct two families of technology diffusion instances per the model in 
Definition IH. 81 Each family will be on a graph of diameter at most 4, and require at most 2 different 
threshold values, and each will consider connected seedsets. The first family will fail to exhibit the 
submodularity property while the second will fail to exhibit supermodularity. 
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e=n+2 e2„+i=2n+l e=n+2 

Figure 12: An instance of the technology diffusion problem. 

Let {G,6} be an arbitrary technology diffusion problem. We shall write fceiS) be the total 
number of nodes that are eventually activate after seedset S activates. When G and 9 are clear 
from the context, we simply refer to fG,e{S) as f{S). 

F.1.1 The influence function is not submodular. 

Let n be a sufficiently large integer such that the number of vertices in the graph is 2n + l. This 
family of technology diffusion problems (which again is implicitly parameterized by n) is shown in 
Figure [12] and defined as follows: 

• The vertex set is {^1,^2, ■■■,'V2n+i}- 

• The edge set is constructed as follows, 

— The subsets {vi, ...jVn} and {vn+i, ■■■,V2n} form two cliques. 

— Vertex V2n+i is connected to all other vertices in the graph, i.e., edges {vi,V2n+i}, {v2n,V2'. 

• The threshold function is 

- for i < 2n, 9{vi) = n + 2. 

- 9iv2n+i) = 2n + l. 

To show this problem is non-submodular, we shall find two disjoint sets Si and 5*2 such that 

fiSi) + fiS2)<fiSiUS2) (6) 

We chose Si = {vi,...,Vn} and ^2 = {f2n+i}- Note that Si and ^2 are connected, and that 
f(Si) = n, f{S2) = 1, while U S2) = 2n + 1 so that ([6]) holds. □ 

F.1.2 The influence function is not supermodular. 

Let n be a sufficiently large integer that represents the number of vertices in the graph. Our 
family of technology diffusion problems G, 9 (implicitly parameterized by n) shown in Figure [13] 
and defined as follows: 

• The vertex set is {vi, ...,Vn}- 
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6=2 e=n 
Figure 13: Another instance of the technology diffusion problem. 



• The edge set is defined as follows: 

— For any i < j < n — 4, {vi,Vj} is in the edge set, i.e., the subgraph induced by 
{vi, Vn-4:} is a complete graph. 

- The remaining edges are {t;i,i;n-3}, {vi,Vn-2}, {^n-s, ^^n-i}, {vn-2,Vn}, and {f„_3, ?;n_2}. 

• The threshold function is 

- For z < n - 4, e{vi) = 2. 

— For i > n — 4, 6{vi) = n. 

To show this problem is not supermodular, we choose two disjoint sets Si and S2 such that 

fiSi) + fiS2)>fiSiUS2) (7) 

We choose Si = {vns} and ^2 = {fn-2}- Note that Si and S2 are connected, and f{Si) = /(S'2) = 
n — 3, while f{Si U ^2) = n — 2 so that ([7]) indeed holds. □ 

F.2 Randomized threshold case 

We now consider a modified version of our problem, where, as in [23], we assume nodes thresh- 
olds are chosen uniformly at random: 

Definition F.l (Randomized technology diffusion optimization problem.). The randomized tech- 
nology diffusion model is identical to the model defined in Definition \2.1\ with the exception that 
nodes choose their thresholds uniformly and independently at random from the set {2, 3, ...,n}. 
Thus, the randomized technology diffusion optimization problem is to find the smallest feasible 
seedset S in expectation over node's choice of thresholds, when G is given as input. 

We follow [23] and let the influence function fciS) be the expected number of vertices that are 
eventually activated, i.e., faiS) = Fjg[fGfi{S)], where fG,e{S) is the number of activated vertices, 
and expectation is taken over the choice of thresholds. We present two families of problem in- 
stances: each family will be on a graph of diameter at most 4, and will consider connected seedsets. 
The first family will fail to exhibit submodularity of fciS), while the second will fail to exhibit 
supermodularity. 
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F.2.1 The influence function is not submodular. 

Let n be a sufficiently large integer such that the number of vertices in the network is 2n + 1. 
Our family of G (parameterized by n) is defined as 

• The vertex set is {^1,^2, ...,f2n+i}- 

• The edge set is constructed as follows, 

— The subsets {vi, ...,fn} and {vn+i, ■■■,V2n} form two cliques. 

- The remaining edges are {v2n+i,vi} and {v2n+i,V2n}- 

Notice that this family of graphs is almost identical to the non-submodular example presented in 
the previous section, shown in Figure [T2l except that now, the middle node V2n+i is only connected 
to vi and V2n- We shall find two disjoint set 5*1 and 5*2 such that 

/g(5i) + /g(52)</g(5iU52). (8) 
Our choice of and ^2 is 5i = {vi, w„} and 5*2 = {f2n+i}- We start with computing fciSi): 

fciSi) = nfcASi) I 0{v2n+i) < n+l]FT[9{v2n+i) < n+l]+E[fG.eiSi) I ^K.+i) > n+l]FT[9{v2n+i) > n+1] 

(9) 

Notice that 

nfcASi) I 0{v2n+i) < n + 1] = EifceiSi U S2)] = fciSi U S2) (10) 
E[fG,e{Si)\0{v2n+i)>n + l]=n 

Therefore, we may rewrite ([9]) as 

fciSi) = fciS, U S2) Fr[0{v2n+i) < n + 1] + n Pr[^(r;2„+i) > n + 1] = ■^^('^1^^ ^2) ^ ^^^^ 

We next move to compute /g('S'2)- To understand how the influence of 5*2 = {v2n+i} spreads, we 
condition on the thresholds of its neighbors: 0{vi),9{v2n)- 

fG{S2) < 1 • Pr[eivi) > 2 n eiv2n) > 2] + (2n + 1) • PT[e{vi) = 2 u e{v2n) = 2] 

= 1(1 - ^)(l - 2k) + (2^ + l)(22k(l - 2k) + ^2^) 

= 1 + 2^2^(2(1- + 2k) < 3 (12) 
Therefore, from (jlip and (jl2p we have 

fciSi) + fG{S2) < 3 + i(/G(5i U S2) + n) (13) 

Recall that our goal is to show that fciSi) + /g(5'2) < /g(5'i U 52). Using (fT3|) . we now see that 
it suffices to prove that 

/g(5iU52) >n + 6 
We prove this by conditioning on the event that Si U 52 activates node V2n '■ 

fciSi U S2) = fciSi U 52 U {V2n}) Pr[e{v2n) > n + 1] + (n + 1) Pv[e{v2n) <n + l] 

= n+l + ^ 

where the first inequality follows because the thresholds of half of the nonseed vertices ...,V2n-i} 
are < n + 1 in expectation. Thus, we indeed have that Si and 52 are connected and /g(5i) + 
/g(5'2) < fciSi U 52) when n is sufficiently large. □ 
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F.2.2 The influence function is not supermodular. 

Let n be a sufficiently large integer such that the number of vertices in the network is 2n + 1. 
Our family of G (parameterized by n) is defined as follows, 

• The vertex set is {vi,V2, ...,V2n+i}- 

• The edge set is constructed as follows, 

— The subsets {vi, ...,fn} and {vn+i, ■■■,V2n} form two cliques. 

— Vertex V2n+i is connected to all other nodes in the graph. 

— There is an additional edge {vi,V2n}- 

Notice that this family of graphs is almost identical to the one shown in Figure [T7[ except for 
the addition of a single edge {vi,V2n}- We shall find two disjoint set Si and 5*2 such that 

/g(Si) + /g(52)>/g(5iUS2). (14) 

Our choice of Si and 5*2 is Si = {vi,...,Vn} and 52 = {vn+i, ■■■,V2n}- Notice that these sets 
are connected by the edge {vi,V2n}- By symmetry we have that f{Si) = f{S2), so we start by 
computing fdSi). Let T be the number of active nodes in S2, and let A be the event that vertex 
V2n+i is active. 

E[fG,e{Si)] > n + (1 + E[T\A, Si active]) Pr[^|Si active] 

= n + i(l + ^) (15) 

where the second inequality follows because we used the trivial bound E[T\A,Si active] > n'^^^ 
where we ignore all cascading effects; we simply assume that each of the n nodes in ^2 is connected 
to an active component of size n + 1. On the other hand, 

E[fG,e{Si U S2)] = 2n + Pr[A|5i U 52 active] = 2n + 1 

Thus we indeed have /g(5i) + /(52) >2n + l + ^>2n + l = /g(5i U 52) for all n. □ 

G Probability review 

Theorem G.l (Chenroff bounds). Let Xi, Xn be independent Poisson trials such thatFr[Xi = 
1] = Pi- Let X = ^^^i Xi and fi = E[X]. Then 

1. ForO <6 < I, 

Fr[\X ~ fi\> 6n] < 2exp(-/i5V3)- 

2. for R > 6fi, 

Pv[X >R]< 2-^. 
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H Lemmas to prove Theorem 13.31 



In the following, let H be the hypergraph corresponding to an optimal solution a to our relaxed 
LP in Figure El and let 5*0 ^ Prelim-Seedset('H) and Tq ^ Get-SeqCH, Sq). Also, for each 
k e {1,...,^}, let 5a,. ^ Update-Seedset('H, 5fc_i) and Tk ^ Get-SeqCH, Sk) for Sk^i that 
satisfies conditions (C.1)-(C.2). 

H.l Size of preliminary seedset Sq. 

We argue that, with high probability, Prelim-Seedset gives us a preliminary seedset 5*0 that 
is at most O(rlnn) times the one given by the optimal solution opt. 

Lemma H.l. Lei ^ Prelim-Seedset('H). We have 

Pr[\So\ > 24(1 + e)V(ln2n)(2opt)] = o(l). 



Proof. We shall show that Prelim-Seedset selects at most 24(1 + e)^ ln(2n)|(T| seeds with high 
probability, where by \a\ we mean the value of the objective function of the linear program (which 
recall from Lemma [3.21 is of size at most (2opt)), so that |c7| = X]i<n X^t<6»(ni) '^^^ lemma 
follows from the fact that Glue-Seeds used inside Prelim-Seedset expands the seed set at most 
a r factor, and \a\ < (2opt). 

Now let Zi be the indicator random variable that sets to 1 if n, is selected as seed dur- 
ing the second step of Prelim-Seedset (i.e., before gluing). Then we have Pr[Zi = 1] = 
min{l, 241n(2n) X^j<5i(u^) Xi^t}- It follows that 

E[^^i] <24(l + e)ln(2n)^ ^ x^,* = 24(1 + e) ln(2n)|a|. 

i<n i<n t<9(ui) 

Since Zi are chosen independently, we may apply a Chernoff bound (Theorem IG.ip and get 
Pr[J^Zi > 24(l + e)2ln(2n)|a|] < exp j -E[^Zi]eV2) = o(l). 

idn \ i<.n I 

and so our lemma follows. □ 



H.2 Feasibility of preliminary activation sequence Tq. 

Our lemma below addresses condition (P.l), and therefore the feasibility of Tq. We show that, 
during the Get-Seq procedure, there should be at least one flag such that hi,t = 1 Vi with good 
probability. 

Lemma H.2. Let Tq he obtained as described above. Consider an arbitrary i G [n]. Let 

t(i) = minlt : Xjf/ > — ; -| 

\ J I ^ tt - 12(1 + e)^ 

t'<t ^ ' 
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It follows Get-Seq assigns the flags bi^t such that 



Pr 



3i: /\ {bi,t' = 0) 
t'<t\i) 



1 

< -. 

- 2 



The proof of Lemma IH.2I relies on some new notation and technical lemmas. First, for each 
G [n], let an arbitrary solution Fi^t for the (i,t)-flow problem be the representative flow for the 
(i,t)-flow problem. We also need the notion of border nodes: 

Definition H.3 (Border nodes for the (i,t)-flow). Consider the {i,t)-flow problem on the hyper- 
graph 7i and the corresponding J^i^f Let us decompose Ti^t into paths (in an arbitrary but consistent 
manner) Vi,V2, ■■■,Vq. Consider an arbitrary one of these pathes Vj. Let Xi.^tj be the last vertex 
on Vj such that Xi.^tj is to the left of the threshold line. We denote Xi.^tj cls border(Pj). The 
border nodes of the {i,t)-flow problem is the set of vertices /3{i,t) = {border('Pi), border('Pg)}. 

We refer to Figure [8] to illustrate an example. Consider the representative flow J-e,6 for the 
{E, 6)-flow problem. Suppose the flow decomposes into three pathes, Vi = Xa,i — Xc,3 — > ^f,4 
Xc,5 XEfi, V2 = Xa,i Xc,3 Xf,4 Xe,6, and V3 = Xa,i Xb,5 ^ Xe,6. We have 
border (7^1) = Xp^/j^ because Xp^/j^ is the last time the path stays to the left of the threshold. Also, 
border(P2) = Xp^i, and border('P3) = Xb,5- Therefore, for this example, f3{E,6) = {^b,5,^f,4}- 

Lemma H.4. Consider an arbitrary {i,t)-flow problem for the graph % and its corresponding set 
of border nodes (3{i,t). We have that the sum of the capacity of all the border nodes in I3{i,t) is at 
least as large as the sum of demands of the sinks j/ for 9{ui) <t'<t, i.e., 

Xjt,Gl3{i,t) e(ui)<t'<t 



of Lemma H.4 Consider the representative flow J-'i^t for the {i,t) problem. Let \J-i^t\ be the cor- 
responding volume of the flow and let J-i^t{X) be the volume of the flow on the node X. We 
have 

X€l3{i,t) e{u^)<t'<t 

On the other hand, we also require the capacity of any node X is no less than the actual flow 
\Ti^tiX)\. Therefore, 

Xjt/ei3(i,t) xei3(i,t) 

The lemma therefore follows. □ 

We are now ready to prove Lemma |H.2[ 
of Lemma \H.^ Let I3{i,t(i)) be the border nodes of the flow Ti^t(^i), let 

Y ^^^^ 

t<e{u,) 
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be the mass of node Ui to the left of the threshold line, and let 

Pi = min{(24(l + e) ln{2n))uji, 1} 



(17) 



be the probability with which Prelim-Seedset fixes node seed. 

A node Ui turns on before timestep t{i) in activation function Tq, i.e., Vt'<t(i)(^«,t' = l); if 
either: 

1. Uj is selected as a seed; this happens with probability pi. 

2. One of the border nodes Xj^ti G f5{i,t{i)) corresponds to a seed node Uj; this happens with 
probability pj . 

Notice that the above two events are not necessarily independent, because could be a border 
node in some (z, t(i))-flow problemQ Next, we have 



Pr 



A 

t'<t{i) 



i,t' 



<min{l-pj, Y\ 



mm < 



r3t' 



1 -min{(24(l + e)ln(2n))a;i,l}, JJ (1 - min{(24(l + e) ln(2n))a;j, 1}) 

j:3t' 

(18) 



We use Lemma [H.4l and some algebra to bound the quantity on the right hand size of (jlSp . Roughly, 
we use the idea discussed in Section [STTl if the total flow at node u at time t is /i + /2, then the 
probability that the technology is diffused to u via either of these two flows is 1 — (1 — /i)(l — /2) ~ 
/i + /2) so that the total flow can be used to determine node u's activation probability. We start 
by noticing that 



E 



j-3t' j:3t' t<e{uj) 



j,t': 9{ui)<t'<t{i) 



(19) 



x,t,e/3(i,t) 



(where the first equality holds by equation ()16p . and the last inequality holds because of Lemma lH.4[ ) 
Therefore, we have 



2 max < 



j:3t' 

X ,e/3(i,i) 



j:3t' 

x,,,e/3(i,t) 



t'<e{ui) e{ui)<t'<t{i) 



t'<t{i) 



(20) 

(where the first inequality follows from algebra, the second from equations (jl6p and (jlOp . the third 
from algebra, and the final by definition of timestep t{i) in the statement of this lemma.) Our next 
step involves an arithmetic lemma, as follows: 



^While at first glance this seems to suggest the type of problematic recirculation of flow leading to the integrality 
gap we discussed in Appendix|Al its is not actually a problem for us, since the flow through such an Xi^t' cannot be 
amplified due to the flow constraints. 
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Lemma H.5. Let e be a suitably small constant. Let xi,...,Xn be numbers between [0,1] such that 
J2i<n ~ ^> where s > 24(1+^ • ~ 24{1 + e) ln(2n). It follows that 

J](l-min{Axi,l}) < ^. 

i<n 

Equation (pOj) allows us to apply the arithmetic lemma to the right side of as follows: 



mm < 



1 - min{(24(l + e) ln(2n))wi, 1}, ]J (1 - min{(24(l + e) ln(2n))a;j, 1}) I < — 

I 277- 



j:3t' 
X^,,,e/3(i,t) 

and the lemma follows from a union bound across all nodes. □ 

of Arithmetic Lemma \H. 51 Let us consider two cases over the values of Xj. In the first case, there 
exists some Xi such that Xxi > 1. For this case, we have 

J|(l-min{Axi,l}) = 0< ^. 

i<n 

In the second case, where all Xj are less than 1/A, the quantity nj<n(-'- ~ min{Axj,l}) = 
ni<n(-'- ~ ^^i) maximized when xi = X2 = ... = x^ = ^. In other words. 



n{i-A..)<(i-^)' 



i<n 



AS \ 



n 

< exp(— As) 

<exp(-^^^) = exp(-ln(2n)) = i-. 



H.3 Connectivity Lemma for the activation sequence. 



□ 



Next we prove the Connectivity Lemma, that shows that if Get-Seq takes in a connected 
seedset, it returns a connected activation sequence: 

of Connectivity Lemma \3.5[ We inductively prove that |Jj,<^ r~^(t') is connected for I < t < n. 
For the base case, observe that Get-Seq is such that T~^(l) = S is the set of seeds, so T~-^(l) is 
connected. For any non-seed node Ui such that T[ui) > 1, Get-Seq is such that the corresponding 
hypergraph node to the right of the threshold line is "activated". Further, there is a path 

in H from Xi^x{ui) to some "activated" hypergraph node Xj^t to the left of the threshold line, where 
Uj € is a seed and t < 6{uj). Each edge in V. corresponds to an edge in G, and all hypergraphs 
nodes along the path must be "activated" before T{ui). Thus, Ui is connected to the seedset S in 

the subgraph of G induced by {ui} U (Ut<T(Mi) ^"^(^i))- Connectivity of T follows. □ 
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H.4 Partial consistency. 

The following lemma shows that partial consistency, i.e., condition (C.l), is almost immediate: 
Lemma H.6. Sk and are partially consistent up to Ok — ^ if 

1. Sk^i and Tfc_i are partially consistent up to time 9k-i — 1, and 

2. Tfc satisfies (C.2); i.e., T^. has at least 9j — 1 nodes active by timestep 9j — 1 for any j < k. 

Proof. First, the Connectivity Lemma 13.51 implies that that Tk is connected. Therefore, to decide 
whether a node Ui is a seed with respect to T^, we need count the number of active nodes (per T^) 
prior to time Tk{ui) and compare it with Uj's threshold 9{ui). Thus, we need only prove that for 
any node Ui such that 9k-i < Tk{ui) < 9k — 1, either (a) the number of active nodes prior to time 
Tk{ui) is at least 9{ui) — 1 or (b) Ui is a seed, i.e., Ui G Sk . We have two cases: 
Case 1. Tk(ui) < 9{ui). By construction of Get-Seq, it follows that ui G S'fc is a seed. 

Case 2. Tk{ui) > 9{ui). Since we are only concerned with nodes such that Tk{ui) < 9k, we have 
9{ui) G {9i, 9k-i}- Since Tk satisfies (C.2) by assumption, there are at least 9j — 1 active nodes 
at time 9j — 1 in T^. Consequently, the number of activated nodes by the time step Tk{ui) — 1 is 
at least 9{ui) — 1, {i.e., Tk encodes Ui as a non-seed). □ 

H.5 Proof of Gap Size Lemma 13.81 

of Gap Size Lemma \3.8[ This proof follows almost completely from algebra. First, recall that 
our LP requires that X^j<„ = 1 for all t. Therefore, simple algebra gives that 9k — 1 = 
X]t<6»fc-i J2i<n^i,t- "^^^ same simple algebra allows us to write 

\{ui : Tk-i{ui) < 9k - 1}\ = Y 1= Y y^^».i- 
Substituting in these expressions into (0), we have 

p = 9k - 1 - \{ui : Tk-iiui) < 9k - l}\ = lY X] - Y X]^*'* 

\i<nt<9i:-l J \u,:Tk-iiu,)<9k t<n 

Next, since 9k — 1 < n, we modify the second summation in the second summand to obtain 

^ 5Z - E Yl 

^ \ \ Y I Y ) 1 (moving t index 

Finally, consider the term inside the first sum. Its first summand is over all vertices Ui i = l...n, 
while its second summand over all vertices Ui such that Tk-i{ui) < 9k. Thus the difference between 
these summands is over all vertices Ui such that Tk~i{ui) > 9k so finally we have 
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□ 



H.6 Proof of Lemma 13.91 

Lemma 13.91 is straightforward given the following Lemma \H.7\ as we shall we show in Sec- 
tion |H]6]2j In this section we focus our main task of proving Lemma IH .71 Recall that the "gap" is 
given by 

Ui:Tk-i{u,)>ek t<ek 

The following lemma handles with each "row" of the gap [i.e., Ui : Tk-i{ui) > 9k) separately: 

Lemma H.7. Let Ui he an arbitrary node such that Tk-i{ui) > 6k, so that Ui is a candidate for 
moving forward in activation sequence Tk (relative to activation sequence Tk-i). It follows that 

Y>i[Tk{ui) < 6k\ > (1 + e) 

t<ek 

Before we begin the proof, we need a few definitions, related to the definitions we introduced 
in Appendix IH.2t 

Definition H.8 (Border flow). Consider the {i,t)-fiow problem and the corresponding representa- 
tive flow Ti^t- LetVi, ■■■,Vq be the decomposition of Ti^t so that /3{i,t) = {border('Pi), border ("Pg)}. 
Fix an arbitrary X G (3{i,t), define the flow across the border with respect to Ti^t o.s 

fi,t{x)= E i^^-i- 

j:border(-Pj)=X 

We shall refer fi^t{') o,s the border flow function with respect to the {i,t)-flow problem. 

Notice that it is possible that border flow fi^t{X) does not equal the representative flow Fi^t that 
passes through X (i.e. fi^t{X) < Fi,t[X)) because, e.g., there could be two paths from Vi,...,Vq 
that passes through X where X is the border of one path and is not the border of the other one. 

We refer back to Figure [8] for an example. Consider the (i3,6)-flow problem and let J-b,6 
consist of two paths: Vi = Xa,i — )• Xb,5 and V2 = Xa,i — > Xc,3 — >■ Xp^^ — )• Xe,5 — )• Xsfi. Notice 
that I3{B,G) = {Xa,i, Xp^i}. We have J^Bfii^A,!) = l-^B.el while fBfii^A,!) only consists of the 
volume for flow along the path Vi. i.e., Fb^^Xa,!) 7^ fEfii^A,!)- 

Our analysis for Lemma lH.71 utilizes the following fact. 

Fact H.l. Consider the {i,t)-flow problem on the graph % and the corresponding border flow 
function fi^t{-)- We have 

E Mx)= E (21) 

Xe(3{i,t) 9{ui)<t'<t 

This fact is intuitively straightforward, because all the "border flow" shall eventually move to 
the sinks, though the actual formalization is fairly tedious, so we present it after the proof of 
Lemma IH.7I 
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of Lemma H. ? , We concern ourselves with Ui that activates after timestep Ok in activation function 
Tk-i- . Let's consider the (i, 9k — l)-flow problem, and the corresponding border nodes 9k — 1) 
as defined in Definition IH.3I Since u turns on after 9k in Tk-i, it must fohow that none of the 
border nodes are activated in Tk-i\ otherwise, Get-Seq would have activated Ui by timestep 9k — I 
in Tk-i- 

Now let's consider Tk- A sufficient condition for Ui to be activated in Tk by timestep 9k — 1 is 
either (a) Ui is a seed, or (b) at least one node Uj corresponding to a border nodes in f3{i, 9k — 1) is 
selected as a seed. (Since by definition, border nodes Xj^t always have t < 9{uj), Get-Seq is such 
that the only way a border node Xj^t can be activated is if it corresponds to a seed Uj.) 

Now node n, will be activated before time 9k in Tk if either (a) the border nodes Xj^t G f^ih 9k — 
1) that were not "active" in Tk-i become active in Tk (this occurs with probability 4(l-|-e)wj, where 
Uj is as in equation ([16|)) since Tk ^ Get-SeqCH, Sk-i and Sk ^ Update-Seedset(?{, 
or (b) Ui is itself is a seed in Sk (this occurs with probability 4(1 + e)uji). Notice that events (a) 
and (b) could be correlated, so we have that 

PT[Tk{u,)> 9k] < min < {1-4{1 + e)uji), JJ (1 - 4(1 + e)wj) I (22) 

«,G/3(i,efc-i) J 

Given equation ()22p . our lemma will follow from the following claim: 
Claim H.l. 



mm < 



(l-4(l + e)a;i), J] (1 - 4(1 + e)u;,) i < 1 - (1 + e) ^ x,,^. (23) 
Uj£i3{i,ek-i) J t<ek 

□ 

Roughly speaking, the idea in Claim IH.ll is to use the first order approximation to give a 
bound on the product term Y\ujei3{i ek-i)('^ ~ ^^^y considering linear terms in 

this quantity, we get n«,e/3(i,0fe-i)(l " ^(1 + e)ujj) ~ 1 - 0{J2u,ei3{i,e,-i)^j)- Together with the 
inequality established in Fact IH.l| we can rewrite 0(X]u .g/3(j e^.-!) "^j) = 0{Yle{ui)<t'<t^i,t')^ which 
allows us to conclude Claim lILTl We now formalize this idea step by step: 



of Claim UTll We start with analyzing the right term in product in inequality (|23p . Recall that 
J-ifi^-i is the representative flow for the (i, 9k — l)-flow problem. Fi^e^^i{Xj^ti) is the corresponding 
flow that passes through the node Xj ti . For each flow 

since Xji^t represents the capacity of the node Xj^t' per the flow constraints. Now we consider the 
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terms inside the product on the left in equation ([23]): 



1 - 4(1 + e)a;j = 1 - 4(1 + e)( ^ xj^t) (Definition of in equation ([T6])) (24) 

t<e{uj) 

< 1 — 4(1 + e) Tifij,^i{Xj^t) I (Flow is bounded by capacity) 

( \ 



< 1-4(1 + e) 



< 1-4(1 + e) 



. t<e(tij)A 
\x,,te/3(i,efc-i) 



\x,,te/3(i,efc-i) 



J 



(algebra) 



(Construction of /j,t(-) in Definition IH.SP 



and notice that Xj^t could be in /3{i,9k — 1) only if Xj^t is to the left of the threshold line, i.e. 
t < 9{uj). Therefore, Xj^t € l^ih &k — 1) implies t < 6{uj) so we can write 



= l-4(l + e) Yl ko,-iiXj,t) 

t:Xj,tg/3(i,9fc-l) 

Next, let us analyze the term (1 — 2(1 + e)u!i) in (f23|) . We have 



(25) 



1 - 4(1 + e)ui = 1 - 4(1 + e) I ^ Xi,t j (Definition of Ui in fflH]) ) 

yt<e{ui) 



< 1 - 4(1 + e) I Yl I (Algebra) 

. t<min{6'ir,-l,0(ui)} 



(26) 



We can substitute (j26j) and (|25]) back into our original equation (p3|) . to yield a giant product. We 
won't write down this messy product yet. Instead, we show how to clean it up using approximation 
of its lower order terms. Specifically, we shall use linear terms to approximate the giant product as 
follows: 

Lemma H.9 (Another Arithmetic Lemma). Let xi,X2, be real values such that X]j<„ Xi <\. 
We have 



i<n y*^'^ / V 



Specifically, when Ylii<n^i — k' have 



\i<n I 
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We present a proof of this arithmetic lemma after we complete the current proof. To use the 
arithmetic lemma to clean up our product, we need to show that condition specified in the lemma 
holds for our setting: 



Claim H.2. 



(27) 



Proof. Starting with the left side of (|27p . we can write 



t<min{efc-l,6l(Mi)} 

^ ^ " -Y,,te/3(iA-i) 



Y ^i't + X] (Using Fact IH.ip 



< 



t<mm{6»fc-l,e(Mi)} ( 

(Algebra) 

t<e{ui) 
1 



(«,)<i<efc-i 



(28) 



(29) 



12(1 + e) 

We obtained the last inequality as follows. Property (P.l) required by the Prelim-Seedset tells 
us that if a node Uj is on at timestep later than Of^. — ! in activation sequence Tq , then it follows that 



KOk-l < 12(l+e)' 



In this lemma we are concerned only with node ui that are on at timestep 



later than 0^ — 1 in activation sequence T^-i- However, recall the relationship between Tq and T^; 
any node that is on after 6k — 1 in T^-i must be on after — 1 in Tq as well. Thus, we can conclude 



that all of the Uj's we consider in this lemma have ^ 



t<e^,-l ^i,t < 12{l+e) • 



□ 



Finally, we apply the arithmetic Lemma lH.Ql to the giant product obtained when we substitute 
and ([2^ back into our original equation ([25]) and obtain: 



mm < 



(l-4(l + e)u;,), n (1 - 4(1 + e)a;,- 
uje0{i,ek-i) 

< (1 - 4(1 + e)uj,)'/^ n (1 - 4(1 + e)^i)'/' 

1/2 / / 



1/2 



< I 1 - 4(1 + e) Yl ^i^i 

t<min{efc~l,e(ui)} 



n 

UjG/3{j,0fc~l) 



1-4(1 + 6) Y keu~i{X,,~ 



\ 



l-2(l + e)| Y Xi,t+Y^ Y^ /«,ffe-i(-''^j,t) 



(By Lemma EjI) 



= /l-2(l + e) x,,t (BydMD). 

Y t<eK) 

<l-(l + e) Y 

t<eK) 

The claim follows. 



□ 
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H.6.1 Omitted proofs from Proof of Lemma IH.7I 

of Fact lH.li Recall that we let I{Y) be an indicator function that is 1 if and only if Y is true. 
The border flow function fi^t{') can be re-written as 

h,t{X)= Yl \V,\ = Y,IiX = hoTder{Vj))\V,\. (30) 

j:border(Pj)=X j<q 

Notice also that we have 

\rj\= I{X = hovdeT{Vj))\Vj\. (31) 

X£V{H) 

Therefore, 

^|-Pj| = Y\ H I{X = hordciiVj))\Vj\\ (By Equation [31|) 
j<q j<q \xevCH) J 

= E E^(^ = boi-der(7',))|7',| 

xevj<q 

= E •^^.^W (By Equation [301) 

xev 

= E] fi,t{X) (Only border nodes have none-zero value on /i,t(-))- 

xe/3(i,t) 

Finally, our claim follows from the fact that 

e(ui)<t'<t j<q 

□ 

of Another Arithmetic Lemma \H.S{ We have 
11(1-^^) 

i<n 

— 1 ^ ^ ~l~ ^ ^ Xi-^Xi2 ^ ^ Xi-^^Xi^Xi^ -\- -|- ( 1) ^ ^ Xi-^^...Xii_ -\- ... 

i<n ii^i2e[n] iiyti2^i3e[n] hj^ilJ^-.-^ik 

^ 1 ^ ^ ~l~ ^ ^ 2^ii2^j2 ~l~ ^ ^ ^^jjXjj^^is ~l~ ~l~ ^ ^ Xi-^...Xi^ -|- ... 

i<n ii,j2G[n] n,j2,«3GN n,«2, 

i<n ^ — <t^;i 



i<n 



□ 
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H.6.2 Obtaining Lemma 13.91 from Lemma [H. 71 

of Lemma \3.9l Let us write I{X) be an indicator variable that sets to 1 if X is true. We liave 



p= J2 i{Tk{ui) <ek). 

Ui:Tf,_i{ui)>ek 



We start by computing E[f)\: 



E[p] = J2 W{TkM<ek)] 

Ui:Tk^iiui)>9k 

— + ^) (Using Lemma IHTtI) 

= (1 + e)7 (Definition of 7). 
To prove the inequality we take: 

(l + e)7 < E[p] 

< Pr [p > 7] • n + 7. 

and algebra shows that Pr[p > 7] > e • 7/n. □ 
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